The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.
Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas
You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve its services so that customers do not renounce their credit cards
In this section, we need to consolidate all necessary libraries that will be used. This includes all that are needed for data analysis (e.g., pandas, numpy), data visualization (plotly), and for modeling (e.g., sci-kit learn).
For this project, we're required to develop 5 models. The classification models that I've chosen are:
import warnings
warnings.filterwarnings("ignore")
import os
# data and analysis libraries
import pandas as pd
import numpy as np
from sklearn import metrics
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
import scipy.stats as stats
from scipy.stats import uniform, randint
#import data visualization libraries
import plotly.express as px
from scipy.stats import skew
import plotly.graph_objects as go
#model libraries
from sklearn.model_selection import train_test_split, KFold, cross_val_score, cross_val_predict, StratifiedKFold
from sklearn import metrics
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
)
# data preprocessing libraries
# To be used for data scaling and encoding
from sklearn.preprocessing import (
StandardScaler,
MinMaxScaler,
OneHotEncoder,
RobustScaler,
)
# 5 models to be used: BaggingClassifier, RandomForestClassifier, AdaBoostClassifier, XGBoost, Stacking
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier, StackingClassifier
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier
#hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV
import optuna
# oversampling and undersampling data
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
#data treatment library
from sklearn.impute import SimpleImputer
from sklearn.impute import KNNImputer
from sklearn.preprocessing import LabelEncoder
# ability to import from github repository
import certifi
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
Loading the dataset. Choosing different varieties of sources, including learning how to leverage data from a SQL database, and utilizing GitHub.
url = "https://raw.githubusercontent.com/wesliejh/Project-3---Credit-Card-Churn/main/BankChurners.csv"
bank_data = pd.read_csv(url)
data = bank_data.copy()
data.head()
| CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | ... | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | Total_Trans_Ct | Total_Ct_Chng_Q4_Q1 | Avg_Utilization_Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | ... | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 | 42 | 1.625 | 0.061 |
| 1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | ... | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 | 33 | 3.714 | 0.105 |
| 2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | ... | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 | 20 | 2.333 | 0.000 |
| 3 | 769911858 | Existing Customer | 40 | F | 4 | High School | NaN | Less than $40K | Blue | 34 | ... | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 | 20 | 2.333 | 0.760 |
| 4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | ... | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 | 28 | 2.500 | 0.000 |
5 rows × 21 columns
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10127 entries, 0 to 10126 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CLIENTNUM 10127 non-null int64 1 Attrition_Flag 10127 non-null object 2 Customer_Age 10127 non-null int64 3 Gender 10127 non-null object 4 Dependent_count 10127 non-null int64 5 Education_Level 8608 non-null object 6 Marital_Status 9378 non-null object 7 Income_Category 10127 non-null object 8 Card_Category 10127 non-null object 9 Months_on_book 10127 non-null int64 10 Total_Relationship_Count 10127 non-null int64 11 Months_Inactive_12_mon 10127 non-null int64 12 Contacts_Count_12_mon 10127 non-null int64 13 Credit_Limit 10127 non-null float64 14 Total_Revolving_Bal 10127 non-null int64 15 Avg_Open_To_Buy 10127 non-null float64 16 Total_Amt_Chng_Q4_Q1 10127 non-null float64 17 Total_Trans_Amt 10127 non-null int64 18 Total_Trans_Ct 10127 non-null int64 19 Total_Ct_Chng_Q4_Q1 10127 non-null float64 20 Avg_Utilization_Ratio 10127 non-null float64 dtypes: float64(5), int64(10), object(6) memory usage: 1.6+ MB
data.shape
(10127, 21)
Observations
objectGender, Education_Level, Marital_Status, Income_Category, and Card_CategoryEducation_Level, and Marital_Status have missing values.data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| CLIENTNUM | 10127.0 | 7.391776e+08 | 3.690378e+07 | 708082083.0 | 7.130368e+08 | 7.179264e+08 | 7.731435e+08 | 8.283431e+08 |
| Customer_Age | 10127.0 | 4.632596e+01 | 8.016814e+00 | 26.0 | 4.100000e+01 | 4.600000e+01 | 5.200000e+01 | 7.300000e+01 |
| Dependent_count | 10127.0 | 2.346203e+00 | 1.298908e+00 | 0.0 | 1.000000e+00 | 2.000000e+00 | 3.000000e+00 | 5.000000e+00 |
| Months_on_book | 10127.0 | 3.592841e+01 | 7.986416e+00 | 13.0 | 3.100000e+01 | 3.600000e+01 | 4.000000e+01 | 5.600000e+01 |
| Total_Relationship_Count | 10127.0 | 3.812580e+00 | 1.554408e+00 | 1.0 | 3.000000e+00 | 4.000000e+00 | 5.000000e+00 | 6.000000e+00 |
| Months_Inactive_12_mon | 10127.0 | 2.341167e+00 | 1.010622e+00 | 0.0 | 2.000000e+00 | 2.000000e+00 | 3.000000e+00 | 6.000000e+00 |
| Contacts_Count_12_mon | 10127.0 | 2.455317e+00 | 1.106225e+00 | 0.0 | 2.000000e+00 | 2.000000e+00 | 3.000000e+00 | 6.000000e+00 |
| Credit_Limit | 10127.0 | 8.631954e+03 | 9.088777e+03 | 1438.3 | 2.555000e+03 | 4.549000e+03 | 1.106750e+04 | 3.451600e+04 |
| Total_Revolving_Bal | 10127.0 | 1.162814e+03 | 8.149873e+02 | 0.0 | 3.590000e+02 | 1.276000e+03 | 1.784000e+03 | 2.517000e+03 |
| Avg_Open_To_Buy | 10127.0 | 7.469140e+03 | 9.090685e+03 | 3.0 | 1.324500e+03 | 3.474000e+03 | 9.859000e+03 | 3.451600e+04 |
| Total_Amt_Chng_Q4_Q1 | 10127.0 | 7.599407e-01 | 2.192068e-01 | 0.0 | 6.310000e-01 | 7.360000e-01 | 8.590000e-01 | 3.397000e+00 |
| Total_Trans_Amt | 10127.0 | 4.404086e+03 | 3.397129e+03 | 510.0 | 2.155500e+03 | 3.899000e+03 | 4.741000e+03 | 1.848400e+04 |
| Total_Trans_Ct | 10127.0 | 6.485869e+01 | 2.347257e+01 | 10.0 | 4.500000e+01 | 6.700000e+01 | 8.100000e+01 | 1.390000e+02 |
| Total_Ct_Chng_Q4_Q1 | 10127.0 | 7.122224e-01 | 2.380861e-01 | 0.0 | 5.820000e-01 | 7.020000e-01 | 8.180000e-01 | 3.714000e+00 |
| Avg_Utilization_Ratio | 10127.0 | 2.748936e-01 | 2.756915e-01 | 0.0 | 2.300000e-02 | 1.760000e-01 | 5.030000e-01 | 9.990000e-01 |
Questions:
total_ct_change_Q4_Q1) vary by the customer's account status (Attrition_Flag)?Months_Inactive_12_mon) vary by the customer's account status (Attrition_Flag)?# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a triangle will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
### Function to plot distributions
def distribution_plot(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
data.duplicated().sum()
0
data.isnull().sum()/len(data)*100
CLIENTNUM 0.000000 Attrition_Flag 0.000000 Customer_Age 0.000000 Gender 0.000000 Dependent_count 0.000000 Education_Level 14.999506 Marital_Status 7.396070 Income_Category 0.000000 Card_Category 0.000000 Months_on_book 0.000000 Total_Relationship_Count 0.000000 Months_Inactive_12_mon 0.000000 Contacts_Count_12_mon 0.000000 Credit_Limit 0.000000 Total_Revolving_Bal 0.000000 Avg_Open_To_Buy 0.000000 Total_Amt_Chng_Q4_Q1 0.000000 Total_Trans_Amt 0.000000 Total_Trans_Ct 0.000000 Total_Ct_Chng_Q4_Q1 0.000000 Avg_Utilization_Ratio 0.000000 dtype: float64
Education_Level and Marital_Status have 15%, and 7% missing, respectively # create variable for only number dtypes
data_int = data.select_dtypes(include='number')
#create variable for all other dtypes
data_cat = data.select_dtypes(exclude='number')
data.plot(kind = 'box', subplots=True, figsize = (10,40), layout = (19,2), sharex=False, sharey = False)
plt.show();
Observations
data_int.hist(layout = (13,3), figsize=(8, 30))
plt.show();
Attrition_Data¶data['Attrition_Flag'].value_counts()
Attrition_Flag Existing Customer 8500 Attrited Customer 1627 Name: count, dtype: int64
data['Attrition_Flag'].value_counts()/len(data)*100
Attrition_Flag Existing Customer 83.934038 Attrited Customer 16.065962 Name: count, dtype: float64
labeled_barplot(data, 'Attrition_Flag')
Gender¶data['Gender'].value_counts()
Gender F 5358 M 4769 Name: count, dtype: int64
data['Gender'].value_counts()/len(data)*100
Gender F 52.908068 M 47.091932 Name: count, dtype: float64
labeled_barplot(data, 'Gender')
Education_Level¶data['Education_Level'].value_counts()
Education_Level Graduate 3128 High School 2013 Uneducated 1487 College 1013 Post-Graduate 516 Doctorate 451 Name: count, dtype: int64
data['Education_Level'].value_counts()/len(data)*100
Education_Level Graduate 30.887726 High School 19.877555 Uneducated 14.683519 College 10.002962 Post-Graduate 5.095290 Doctorate 4.453441 Name: count, dtype: float64
labeled_barplot(data, 'Education_Level')
Education_LevelMarital_Status¶data['Marital_Status'].value_counts()
Marital_Status Married 4687 Single 3943 Divorced 748 Name: count, dtype: int64
data['Marital_Status'].value_counts()/len(data)*100
Marital_Status Married 46.282216 Single 38.935519 Divorced 7.386195 Name: count, dtype: float64
data['Marital_Status'].unique()
array(['Married', 'Single', nan, 'Divorced'], dtype=object)
data['Marital_Status'].isnull().sum()/len(data)*100
7.3960699121161255
labeled_barplot(data, 'Marital_Status')
Income_Category¶data['Income_Category'].value_counts()
Income_Category Less than $40K 3561 $40K - $60K 1790 $80K - $120K 1535 $60K - $80K 1402 abc 1112 $120K + 727 Name: count, dtype: int64
data['Income_Category'].unique()
array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
'$120K +', 'abc'], dtype=object)
data['Income_Category'].value_counts()/len(data)*100
Income_Category Less than $40K 35.163425 $40K - $60K 17.675521 $80K - $120K 15.157500 $60K - $80K 13.844179 abc 10.980547 $120K + 7.178829 Name: count, dtype: float64
labeled_barplot(data, 'Income_Category')
Card_Category¶data['Card_Category'].value_counts()
Card_Category Blue 9436 Silver 555 Gold 116 Platinum 20 Name: count, dtype: int64
# creating function for a quick overview of data around the quartile information
def outlier_review(data, column):
"""
Function to review outliers in a column
data: dataframe
column: column name
"""
q1 = data[column].quantile(0.25)
q3 = data[column].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - (1.5 * iqr)
upper_bound = q3 + (1.5 * iqr)
outliers = data[(data[column] < lower_bound) | (data[column] > upper_bound)][column]
num_above = data[data[column] > upper_bound].shape[0]
num_below = data[data[column] < lower_bound].shape[0]
print(
"The number of outliers in "
+ column
+ " is "
+ str(data[(data[column] < lower_bound) | (data[column] > upper_bound)][
column
].count())
)
print()
print("The upperbound value is: ", upper_bound)
print("The lowerbound value is: ", lower_bound)
print()
print("The number of points above the upper bound is " + str(num_above))
print("The number of points below the lower bound is " + str(num_below))
print()
print("Quick overview of outliers:\n", outliers, sep="")
# creating a function for a quick data review of a specific column
def data_review(data, column):
"""
Function to review data in a column
data: dataframe
column: column name
"""
print("The number of missing values is: ", data[column].isnull().sum())
print()
print("The number of unique values is: ", data[column].nunique())
print()
print("The data type is: ", data[column].dtype)
print()
print("The data description: \n", data[column].describe().T, sep="")
print()
print("The percentage of data points amongst the column is:\n", data[column].value_counts()/len(data)*100)
def review(data, column):
histogram_boxplot(data, column)
outlier_review(data, column)
data_review(data, column)
skewness = skew(data[column])
print(f"Skewness of {column}: {skewness}")
CLIENTNUM¶data['CLIENTNUM'].nunique()
10127
CLIENTNUM has the same amonut of unique values as the number of rows in datadata.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10127 entries, 0 to 10126 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CLIENTNUM 10127 non-null int64 1 Attrition_Flag 10127 non-null object 2 Customer_Age 10127 non-null int64 3 Gender 10127 non-null object 4 Dependent_count 10127 non-null int64 5 Education_Level 8608 non-null object 6 Marital_Status 9378 non-null object 7 Income_Category 10127 non-null object 8 Card_Category 10127 non-null object 9 Months_on_book 10127 non-null int64 10 Total_Relationship_Count 10127 non-null int64 11 Months_Inactive_12_mon 10127 non-null int64 12 Contacts_Count_12_mon 10127 non-null int64 13 Credit_Limit 10127 non-null float64 14 Total_Revolving_Bal 10127 non-null int64 15 Avg_Open_To_Buy 10127 non-null float64 16 Total_Amt_Chng_Q4_Q1 10127 non-null float64 17 Total_Trans_Amt 10127 non-null int64 18 Total_Trans_Ct 10127 non-null int64 19 Total_Ct_Chng_Q4_Q1 10127 non-null float64 20 Avg_Utilization_Ratio 10127 non-null float64 dtypes: float64(5), int64(10), object(6) memory usage: 1.6+ MB
CLIENTNUM was dropped by reviewing infoCustomer_Age¶review(data, 'Customer_Age')
The number of outliers in Customer_Age is 2 The upperbound value is: 68.5 The lowerbound value is: 24.5 The number of points above the upper bound is 2 The number of points below the lower bound is 0 Quick overview of outliers: 251 73 254 70 Name: Customer_Age, dtype: int64 The number of missing values is: 0 The number of unique values is: 45 The data type is: int64 The data description: count 10127.000000 mean 46.325960 std 8.016814 min 26.000000 25% 41.000000 50% 46.000000 75% 52.000000 max 73.000000 Name: Customer_Age, dtype: float64 The percentage of data points amongst the column is: Customer_Age 44 4.937296 49 4.887923 46 4.838550 45 4.799052 47 4.729930 43 4.670682 48 4.660808 50 4.463316 42 4.206576 51 3.930088 53 3.821467 41 3.742471 52 3.712847 40 3.564728 39 3.288239 54 3.031500 38 2.992002 55 2.755011 56 2.587143 37 2.567394 57 2.202034 36 2.182285 35 1.816925 59 1.550311 58 1.550311 34 1.441691 33 1.254073 60 1.254073 32 1.046707 65 0.997334 61 0.918337 62 0.918337 31 0.898588 26 0.770218 30 0.691221 63 0.641849 29 0.552977 64 0.424607 27 0.315987 28 0.286363 67 0.039498 66 0.019749 68 0.019749 70 0.009875 73 0.009875 Name: count, dtype: float64 Skewness of Customer_Age: -0.03360003857464426
Dependent_count¶review(data, 'Dependent_count')
The number of outliers in Dependent_count is 0 The upperbound value is: 6.0 The lowerbound value is: -2.0 The number of points above the upper bound is 0 The number of points below the lower bound is 0 Quick overview of outliers: Series([], Name: Dependent_count, dtype: int64) The number of missing values is: 0 The number of unique values is: 6 The data type is: int64 The data description: count 10127.000000 mean 2.346203 std 1.298908 min 0.000000 25% 1.000000 50% 2.000000 75% 3.000000 max 5.000000 Name: Dependent_count, dtype: float64 The percentage of data points amongst the column is: Dependent_count 3 26.977387 2 26.217044 1 18.149501 4 15.542609 0 8.926632 5 4.186827 Name: count, dtype: float64 Skewness of Dependent_count: -0.02082245083419453
Months_on_book¶review(data, 'Months_on_book')
The number of outliers in Months_on_book is 386
The upperbound value is: 53.5
The lowerbound value is: 17.5
The number of points above the upper bound is 198
The number of points below the lower bound is 188
Quick overview of outliers:
11 54
18 56
27 56
39 56
52 54
..
10054 15
10062 17
10069 14
10107 54
10114 15
Name: Months_on_book, Length: 386, dtype: int64
The number of missing values is: 0
The number of unique values is: 44
The data type is: int64
The data description:
count 10127.000000
mean 35.928409
std 7.986416
min 13.000000
25% 31.000000
50% 36.000000
75% 40.000000
max 56.000000
Name: Months_on_book, dtype: float64
The percentage of data points amongst the column is:
Months_on_book
36 24.321122
37 3.535104
34 3.485731
38 3.426484
39 3.367236
40 3.288239
31 3.140120
35 3.130246
33 3.011751
30 2.962378
41 2.932754
32 2.853757
28 2.715513
43 2.695764
42 2.676015
29 2.379777
44 2.271156
45 2.241533
27 2.034166
46 1.945295
26 1.836674
47 1.688555
25 1.629308
48 1.599684
24 1.579935
49 1.392318
23 1.145453
22 1.036832
56 1.017083
50 0.947961
21 0.819591
51 0.789967
53 0.770218
20 0.730720
13 0.691221
19 0.622099
52 0.612225
18 0.572726
54 0.523353
55 0.414733
17 0.385109
15 0.335736
16 0.286363
14 0.157993
Name: count, dtype: float64
Skewness of Months_on_book: -0.1065495749017217
Total_Relationship_Count¶review(data, 'Total_Relationship_Count')
The number of outliers in Total_Relationship_Count is 0 The upperbound value is: 8.0 The lowerbound value is: 0.0 The number of points above the upper bound is 0 The number of points below the lower bound is 0 Quick overview of outliers: Series([], Name: Total_Relationship_Count, dtype: int64) The number of missing values is: 0 The number of unique values is: 6 The data type is: int64 The data description: count 10127.000000 mean 3.812580 std 1.554408 min 1.000000 25% 3.000000 50% 4.000000 75% 5.000000 max 6.000000 Name: Total_Relationship_Count, dtype: float64 The percentage of data points amongst the column is: Total_Relationship_Count 3 22.760936 4 18.880221 5 18.672855 6 18.425990 2 12.274119 1 8.985879 Name: count, dtype: float64 Skewness of Total_Relationship_Count: -0.16242835172024658
Months_Inactive_12_mon¶review(data, 'Months_Inactive_12_mon')
The number of outliers in Months_Inactive_12_mon is 331
The upperbound value is: 4.5
The lowerbound value is: 0.5
The number of points above the upper bound is 302
The number of points below the lower bound is 29
Quick overview of outliers:
12 6
29 0
31 5
108 0
118 6
..
9964 5
10028 5
10035 6
10049 5
10066 6
Name: Months_Inactive_12_mon, Length: 331, dtype: int64
The number of missing values is: 0
The number of unique values is: 7
The data type is: int64
The data description:
count 10127.000000
mean 2.341167
std 1.010622
min 0.000000
25% 2.000000
50% 2.000000
75% 3.000000
max 6.000000
Name: Months_Inactive_12_mon, dtype: float64
The percentage of data points amongst the column is:
Months_Inactive_12_mon
3 37.977683
2 32.408413
1 22.049965
4 4.295448
5 1.757677
6 1.224449
0 0.286363
Name: count, dtype: float64
Skewness of Months_Inactive_12_mon: 0.6329673568012449
Contacts_Count_12_mon¶review(data, 'Contacts_Count_12_mon')
The number of outliers in Contacts_Count_12_mon is 629
The upperbound value is: 4.5
The lowerbound value is: 0.5
The number of points above the upper bound is 230
The number of points below the lower bound is 399
Quick overview of outliers:
2 0
4 0
8 0
12 0
20 0
..
10101 5
10106 5
10109 5
10114 5
10120 0
Name: Contacts_Count_12_mon, Length: 629, dtype: int64
The number of missing values is: 0
The number of unique values is: 7
The data type is: int64
The data description:
count 10127.000000
mean 2.455317
std 1.106225
min 0.000000
25% 2.000000
50% 2.000000
75% 3.000000
max 6.000000
Name: Contacts_Count_12_mon, dtype: float64
The percentage of data points amongst the column is:
Contacts_Count_12_mon
3 33.376123
2 31.865311
1 14.802014
4 13.745433
0 3.939962
5 1.737928
6 0.533228
Name: count, dtype: float64
Skewness of Contacts_Count_12_mon: 0.011003996010760743
Credit_Limit¶review(data, 'Credit_Limit')
The number of outliers in Credit_Limit is 984
The upperbound value is: 23836.25
The lowerbound value is: -10213.75
The number of points above the upper bound is 984
The number of points below the lower bound is 0
Quick overview of outliers:
6 34516.0
7 29081.0
16 30367.0
40 32426.0
45 34516.0
...
10098 34516.0
10100 29808.0
10104 29663.0
10110 34516.0
10112 34516.0
Name: Credit_Limit, Length: 984, dtype: float64
The number of missing values is: 0
The number of unique values is: 6205
The data type is: float64
The data description:
count 10127.000000
mean 8631.953698
std 9088.776650
min 1438.300000
25% 2555.000000
50% 4549.000000
75% 11067.500000
max 34516.000000
Name: Credit_Limit, dtype: float64
The percentage of data points amongst the column is:
Credit_Limit
34516.0 5.016293
1438.3 5.006418
9959.0 0.177743
15987.0 0.177743
23981.0 0.118495
...
9183.0 0.009875
29923.0 0.009875
9551.0 0.009875
11558.0 0.009875
10388.0 0.009875
Name: count, Length: 6205, dtype: float64
Skewness of Credit_Limit: 1.6664789242587705
Total_Revolving_Bal¶review(data, 'Total_Revolving_Bal')
The number of outliers in Total_Revolving_Bal is 0
The upperbound value is: 3921.5
The lowerbound value is: -1778.5
The number of points above the upper bound is 0
The number of points below the lower bound is 0
Quick overview of outliers:
Series([], Name: Total_Revolving_Bal, dtype: int64)
The number of missing values is: 0
The number of unique values is: 1974
The data type is: int64
The data description:
count 10127.000000
mean 1162.814061
std 814.987335
min 0.000000
25% 359.000000
50% 1276.000000
75% 1784.000000
max 2517.000000
Name: Total_Revolving_Bal, dtype: float64
The percentage of data points amongst the column is:
Total_Revolving_Bal
0 24.390244
2517 5.016293
1965 0.118495
1480 0.118495
1434 0.108621
...
2467 0.009875
2131 0.009875
2400 0.009875
2144 0.009875
2241 0.009875
Name: count, Length: 1974, dtype: float64
Skewness of Total_Revolving_Bal: -0.14881520376464566
Avg_Open_To_Buy¶review(data, 'Avg_Open_To_Buy')
The number of outliers in Avg_Open_To_Buy is 963
The upperbound value is: 22660.75
The lowerbound value is: -11477.25
The number of points above the upper bound is 963
The number of points below the lower bound is 0
Quick overview of outliers:
6 32252.0
7 27685.0
16 28005.0
40 31848.0
45 34516.0
...
10100 29808.0
10103 22754.0
10104 27920.0
10110 33425.0
10112 34516.0
Name: Avg_Open_To_Buy, Length: 963, dtype: float64
The number of missing values is: 0
The number of unique values is: 6813
The data type is: float64
The data description:
count 10127.000000
mean 7469.139637
std 9090.685324
min 3.000000
25% 1324.500000
50% 3474.000000
75% 9859.000000
max 34516.000000
Name: Avg_Open_To_Buy, dtype: float64
The percentage of data points amongst the column is:
Avg_Open_To_Buy
1438.3 3.199368
34516.0 0.967710
31999.0 0.256739
787.0 0.078997
701.0 0.069122
...
6543.0 0.009875
2808.0 0.009875
21549.0 0.009875
6189.0 0.009875
8427.0 0.009875
Name: count, Length: 6813, dtype: float64
Skewness of Avg_Open_To_Buy: 1.6614504071556497
Total_Amt_Chng_Q4_Q1¶review(data, 'Total_Amt_Chng_Q4_Q1')
The number of outliers in Total_Amt_Chng_Q4_Q1 is 396
The upperbound value is: 1.201
The lowerbound value is: 0.28900000000000003
The number of points above the upper bound is 348
The number of points below the lower bound is 48
Quick overview of outliers:
0 1.335
1 1.541
2 2.594
3 1.405
4 2.175
...
9793 0.225
9808 0.202
9963 0.222
10008 0.204
10119 0.166
Name: Total_Amt_Chng_Q4_Q1, Length: 396, dtype: float64
The number of missing values is: 0
The number of unique values is: 1158
The data type is: float64
The data description:
count 10127.000000
mean 0.759941
std 0.219207
min 0.000000
25% 0.631000
50% 0.736000
75% 0.859000
max 3.397000
Name: Total_Amt_Chng_Q4_Q1, dtype: float64
The percentage of data points amongst the column is:
Total_Amt_Chng_Q4_Q1
0.791 0.355485
0.712 0.335736
0.743 0.335736
0.718 0.325862
0.735 0.325862
...
1.216 0.009875
1.645 0.009875
1.089 0.009875
2.103 0.009875
0.166 0.009875
Name: count, Length: 1158, dtype: float64
Skewness of Total_Amt_Chng_Q4_Q1: 1.7318068495622156
Total_Trans_Amt¶review(data, 'Total_Trans_Amt')
The number of outliers in Total_Trans_Amt is 896
The upperbound value is: 8619.25
The lowerbound value is: -1722.75
The number of points above the upper bound is 896
The number of points below the lower bound is 0
Quick overview of outliers:
8591 8693
8650 8947
8670 8854
8708 8796
8734 8778
...
10121 14596
10122 15476
10123 8764
10124 10291
10126 10294
Name: Total_Trans_Amt, Length: 896, dtype: int64
The number of missing values is: 0
The number of unique values is: 5033
The data type is: int64
The data description:
count 10127.000000
mean 4404.086304
std 3397.129254
min 510.000000
25% 2155.500000
50% 3899.000000
75% 4741.000000
max 18484.000000
Name: Total_Trans_Amt, dtype: float64
The percentage of data points amongst the column is:
Total_Trans_Amt
4253 0.108621
4509 0.108621
4518 0.098746
2229 0.098746
4220 0.088871
...
1274 0.009875
4521 0.009875
3231 0.009875
4394 0.009875
10294 0.009875
Name: count, Length: 5033, dtype: float64
Skewness of Total_Trans_Amt: 2.0407010789778317
Total_Trans_Ct¶review(data, 'Total_Trans_Ct')
The number of outliers in Total_Trans_Ct is 2
The upperbound value is: 135.0
The lowerbound value is: -9.0
The number of points above the upper bound is 2
The number of points below the lower bound is 0
Quick overview of outliers:
9324 139
9586 138
Name: Total_Trans_Ct, dtype: int64
The number of missing values is: 0
The number of unique values is: 126
The data type is: int64
The data description:
count 10127.000000
mean 64.858695
std 23.472570
min 10.000000
25% 45.000000
50% 67.000000
75% 81.000000
max 139.000000
Name: Total_Trans_Ct, dtype: float64
The percentage of data points amongst the column is:
Total_Trans_Ct
81 2.053915
71 2.004542
75 2.004542
69 1.994668
82 1.994668
...
11 0.019749
134 0.009875
139 0.009875
138 0.009875
132 0.009875
Name: count, Length: 126, dtype: float64
Skewness of Total_Trans_Ct: 0.1536503056777963
Total_Ct_Chng_Q4_Q1¶review(data, 'Total_Ct_Chng_Q4_Q1')
The number of outliers in Total_Ct_Chng_Q4_Q1 is 394
The upperbound value is: 1.172
The lowerbound value is: 0.22799999999999998
The number of points above the upper bound is 298
The number of points below the lower bound is 96
Quick overview of outliers:
0 1.625
1 3.714
2 2.333
3 2.333
4 2.500
...
9388 0.176
9672 1.294
9856 1.211
9917 1.207
9977 1.684
Name: Total_Ct_Chng_Q4_Q1, Length: 394, dtype: float64
The number of missing values is: 0
The number of unique values is: 830
The data type is: float64
The data description:
count 10127.000000
mean 0.712222
std 0.238086
min 0.000000
25% 0.582000
50% 0.702000
75% 0.818000
max 3.714000
Name: Total_Ct_Chng_Q4_Q1, dtype: float64
The percentage of data points amongst the column is:
Total_Ct_Chng_Q4_Q1
0.667 1.688555
1.000 1.639182
0.500 1.589809
0.750 1.540436
0.600 1.115829
...
0.827 0.009875
0.343 0.009875
1.579 0.009875
0.125 0.009875
0.359 0.009875
Name: count, Length: 830, dtype: float64
Skewness of Total_Ct_Chng_Q4_Q1: 2.063724833411372
Avg_Utilization_Ratio¶review(data, 'Avg_Utilization_Ratio')
The number of outliers in Avg_Utilization_Ratio is 0
The upperbound value is: 1.2229999999999999
The lowerbound value is: -0.697
The number of points above the upper bound is 0
The number of points below the lower bound is 0
Quick overview of outliers:
Series([], Name: Avg_Utilization_Ratio, dtype: float64)
The number of missing values is: 0
The number of unique values is: 964
The data type is: float64
The data description:
count 10127.000000
mean 0.274894
std 0.275691
min 0.000000
25% 0.023000
50% 0.176000
75% 0.503000
max 0.999000
Name: Avg_Utilization_Ratio, dtype: float64
The percentage of data points amongst the column is:
Avg_Utilization_Ratio
0.000 24.390244
0.073 0.434482
0.057 0.325862
0.048 0.315987
0.060 0.296238
...
0.927 0.009875
0.935 0.009875
0.954 0.009875
0.385 0.009875
0.009 0.009875
Name: count, Length: 964, dtype: float64
Skewness of Avg_Utilization_Ratio: 0.7179016418496336
fig = px.imshow(data_int.corr(), text_auto=True, template='plotly_dark', color_continuous_scale=px.colors.sequential.Blues, aspect = 'auto', title = '<b>Correlation Matrix')
fig.update_layout(title_x=0.5)
fig.show()
sns.pairplot(data_int, diag_kind='kde', kind='scatter', palette='husl')
plt.show();
Initial Observations
Avg_Open_To_Buy and Credit_Limit. If both are strongly correlated, it may not be conducive to keep both.Attrition_Flag Versus¶Avg_Open_To_Buy¶distribution_plot(data, 'Avg_Open_To_Buy', 'Attrition_Flag')
Gender¶stacked_barplot(data, 'Gender', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Gender All 1627 8500 10127 F 930 4428 5358 M 697 4072 4769 ------------------------------------------------------------------------------------------------------------------------
Credit_Limit¶distribution_plot(data, 'Credit_Limit', 'Attrition_Flag')
Avg_Utilization_Ratio¶distribution_plot(data, 'Avg_Utilization_Ratio', 'Attrition_Flag')
Contacts_Count_12_mon¶stacked_barplot(data, 'Contacts_Count_12_mon', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Contacts_Count_12_mon All 1627 8500 10127 3 681 2699 3380 2 403 2824 3227 4 315 1077 1392 1 108 1391 1499 5 59 117 176 6 54 0 54 0 7 392 399 ------------------------------------------------------------------------------------------------------------------------
Months_on_book¶distribution_plot(data, 'Months_on_book', 'Attrition_Flag')
Dependent_Count¶stacked_barplot(data, 'Dependent_count', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Dependent_count All 1627 8500 10127 3 482 2250 2732 2 417 2238 2655 1 269 1569 1838 4 260 1314 1574 0 135 769 904 5 64 360 424 ------------------------------------------------------------------------------------------------------------------------
Income_Category¶stacked_barplot(data, 'Income_Category', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Income_Category All 1627 8500 10127 Less than $40K 612 2949 3561 $40K - $60K 271 1519 1790 $80K - $120K 242 1293 1535 $60K - $80K 189 1213 1402 abc 187 925 1112 $120K + 126 601 727 ------------------------------------------------------------------------------------------------------------------------
Customer_Age¶distribution_plot(data, 'Customer_Age', 'Attrition_Flag')
Total_Ct_Chng_Q4_Q1¶distribution_plot(data, 'Total_Ct_Chng_Q4_Q1', 'Attrition_Flag')
Months_Inactive_12_mon¶stacked_barplot(data, 'Months_Inactive_12_mon', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Months_Inactive_12_mon All 1627 8500 10127 3 826 3020 3846 2 505 2777 3282 4 130 305 435 1 100 2133 2233 5 32 146 178 6 19 105 124 0 15 14 29 ------------------------------------------------------------------------------------------------------------------------
distribution_plot(data, 'Total_Trans_Ct', 'Attrition_Flag')
distribution_plot(data, 'Total_Revolving_Bal', 'Attrition_Flag')
stacked_barplot(data, 'Total_Relationship_Count', 'Attrition_Flag')
Attrition_Flag Attrited Customer Existing Customer All Total_Relationship_Count All 1627 8500 10127 3 400 1905 2305 2 346 897 1243 1 233 677 910 5 227 1664 1891 4 225 1687 1912 6 196 1670 1866 ------------------------------------------------------------------------------------------------------------------------
distribution_plot(data, 'Total_Trans_Amt', 'Attrition_Flag')
distribution_plot(data, 'Avg_Utilization_Ratio', 'Attrition_Flag')
Attrition_Flag)Education_Level and Marital_Status contain missing valuesIncome_Category doesn't show it has missing values, it does have an abc value that needs to be treatedEarly Indicators of Attrition
Contacts_Count_12_mon showed the largest correlation from data to attrition. Customers who've had contact 6 times in the last 12 months left
*Total_Trans_Ct, Total_Revolving_Bal, Total_Trans_Amt, and Avg_Utilization_Ratio are all large indicators of potential attrition.CLIENTNUM: As this is just a unique identifier, it won't have any impact on the modelsAvg_Open_to_Buy: This is strongly correlated to Credit_Limit, which for the purposes of a Credit Card analysis, is more useful. Therefore we drop Avg_Open_to_BuyAttrition_Flag with the other columns, not much changes, if at all, in the distribution of the dataIn this section, we need to clean up the data for processing.
Attrition_Flag¶data['Attrition_Flag'].unique()
array(['Existing Customer', 'Attrited Customer'], dtype=object)
data['Attrition_Flag'] = data['Attrition_Flag'].replace({'Existing Customer': 1, 'Attrited Customer': 0})
data
| CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | ... | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | Total_Trans_Ct | Total_Ct_Chng_Q4_Q1 | Avg_Utilization_Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 768805383 | 1 | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | ... | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 | 42 | 1.625 | 0.061 |
| 1 | 818770008 | 1 | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | ... | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 | 33 | 3.714 | 0.105 |
| 2 | 713982108 | 1 | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | ... | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 | 20 | 2.333 | 0.000 |
| 3 | 769911858 | 1 | 40 | F | 4 | High School | NaN | Less than $40K | Blue | 34 | ... | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 | 20 | 2.333 | 0.760 |
| 4 | 709106358 | 1 | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | ... | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 | 28 | 2.500 | 0.000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 10122 | 772366833 | 1 | 50 | M | 2 | Graduate | Single | $40K - $60K | Blue | 40 | ... | 2 | 3 | 4003.0 | 1851 | 2152.0 | 0.703 | 15476 | 117 | 0.857 | 0.462 |
| 10123 | 710638233 | 0 | 41 | M | 2 | NaN | Divorced | $40K - $60K | Blue | 25 | ... | 2 | 3 | 4277.0 | 2186 | 2091.0 | 0.804 | 8764 | 69 | 0.683 | 0.511 |
| 10124 | 716506083 | 0 | 44 | F | 1 | High School | Married | Less than $40K | Blue | 36 | ... | 3 | 4 | 5409.0 | 0 | 5409.0 | 0.819 | 10291 | 60 | 0.818 | 0.000 |
| 10125 | 717406983 | 0 | 30 | M | 2 | Graduate | NaN | $40K - $60K | Blue | 36 | ... | 3 | 3 | 5281.0 | 0 | 5281.0 | 0.535 | 8395 | 62 | 0.722 | 0.000 |
| 10126 | 714337233 | 0 | 43 | F | 2 | Graduate | Married | Less than $40K | Silver | 25 | ... | 2 | 4 | 10388.0 | 1961 | 8427.0 | 0.703 | 10294 | 61 | 0.649 | 0.189 |
10127 rows × 21 columns
We need to review the three different features more in depth to understand how we can treat null values:
Education_LevalMarital_StatusIncome_Category: Not missing, but the abc value doesn't make sense in accordance with the rest of the provided valuesencoder = LabelEncoder()
imputer = KNNImputer(n_neighbors=10)
def value_imputation(data, column):
# create dataframe with only the integer columns and the column with nan values that need to be addressed
data_encoded = pd.concat([data_int, data[column]], axis = 1)
# create dataframe without null values to LabelEncoder doesn't encode them. If they encode them, then KNNImputer won't address them since they are given a value
not_null = data_encoded[column].notnull()
data_encoded_not_null = data_encoded[not_null]
# encode the column
data_encoded_not_null[column] = encoder.fit_transform(data_encoded_not_null[column])
# create data frame with the null values so we can join them back into the dataframe
null_rows = data_encoded[column].isnull()
data_encoded_null= data_encoded[null_rows]
# joine them and sort back by their index
data_encoded = pd.concat([data_encoded_not_null, data_encoded_null])
data_encoded = data_encoded.sort_index()
# impute values using KNNImputer, set to 10 nearest neighbors. Using as much of the data from other columns as possible
data_impute = imputer.fit_transform(data_encoded)
data_impute = pd.DataFrame(data_impute, columns=data_encoded.columns)
# Using CLIENTNUM to insert new values back into original dataframe
data.set_index('CLIENTNUM', inplace=True)
data_impute.set_index('CLIENTNUM', inplace=True)
data[column] = data_impute[column]
data.reset_index(inplace=True)
data_impute.reset_index(inplace=True)
# rounding values outputted by KNNImputer so we can transform the values back to their original string values
data[column] = data[column].round().astype(int)
data[column] = encoder.inverse_transform(data[column])
Education_Level¶value_imputation(data, 'Education_Level')
data['Education_Level'].unique()
array(['High School', 'Graduate', 'Uneducated', 'College',
'Post-Graduate', 'Doctorate'], dtype=object)
Education_Level and the Nan values and draw a conclusion as to what the correct missing value should be. InconclusiveMarital_Status¶Choosing to use the same KNNImputer methodology for Marital_Status
value_imputation(data, 'Marital_Status')
data['Marital_Status'].unique()
array(['Married', 'Single', 'Divorced'], dtype=object)
Income_Category¶data['Income_Category'] = data['Income_Category'].replace({'abc': np.nan})
data['Income_Category'].unique()
array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
'$120K +', nan], dtype=object)
value_imputation(data, 'Income_Category')
data['Income_Category'].unique()
array(['$60K - $80K', 'Less than $40K', '$80K - $120K', '$40K - $60K',
'$120K +'], dtype=object)
data.isnull().sum()
CLIENTNUM 0 Attrition_Flag 0 Customer_Age 0 Gender 0 Dependent_count 0 Education_Level 0 Marital_Status 0 Income_Category 0 Card_Category 0 Months_on_book 0 Total_Relationship_Count 0 Months_Inactive_12_mon 0 Contacts_Count_12_mon 0 Credit_Limit 0 Total_Revolving_Bal 0 Avg_Open_To_Buy 0 Total_Amt_Chng_Q4_Q1 0 Total_Trans_Amt 0 Total_Trans_Ct 0 Total_Ct_Chng_Q4_Q1 0 Avg_Utilization_Ratio 0 dtype: int64
value_imputation function CLIENTNUM: Dropping since it only contains unique columnsAvg_Open_To_Buy: Nearly directly correlated with Credit_Limit (99%). We don't need to keep both.data.drop(['CLIENTNUM', 'Avg_Open_To_Buy'], axis = 1, inplace = True)
Let's define a function to output different metrics (including recall) on the train and test set and a function to show confusion matrix so that we do not have to use the same code repetitively while evaluating models.
data_cat = data.select_dtypes(exclude='number').columns.tolist()
data=pd.get_dummies(data, columns=data_cat)
data.head()
| Attrition_Flag | Customer_Age | Dependent_count | Months_on_book | Total_Relationship_Count | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Total_Amt_Chng_Q4_Q1 | ... | Marital_Status_Single | Income_Category_$120K + | Income_Category_$40K - $60K | Income_Category_$60K - $80K | Income_Category_$80K - $120K | Income_Category_Less than $40K | Card_Category_Blue | Card_Category_Gold | Card_Category_Platinum | Card_Category_Silver | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 45 | 3 | 39 | 5 | 1 | 3 | 12691.0 | 777 | 1.335 | ... | False | False | False | True | False | False | True | False | False | False |
| 1 | 1 | 49 | 5 | 44 | 6 | 1 | 2 | 8256.0 | 864 | 1.541 | ... | True | False | False | False | False | True | True | False | False | False |
| 2 | 1 | 51 | 3 | 36 | 4 | 1 | 0 | 3418.0 | 0 | 2.594 | ... | False | False | False | False | True | False | True | False | False | False |
| 3 | 1 | 40 | 4 | 34 | 3 | 4 | 1 | 3313.0 | 2517 | 1.405 | ... | True | False | False | False | False | True | True | False | False | False |
| 4 | 1 | 40 | 3 | 21 | 5 | 1 | 0 | 4716.0 | 0 | 2.175 | ... | False | False | False | True | False | False | True | False | False | False |
5 rows × 34 columns
X = data.drop('Attrition_Flag', axis = 1)
y = data.pop('Attrition_Flag')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1,stratify=y)
Sample code for model building with original data
def confusion_matrix_sklearn(title, model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="", cmap='Blues')
plt.title(title)
plt.ylabel("True label")
plt.xlabel("Predicted label")
results = pd.DataFrame(columns=["Model", "Training Performance (Original)", "Training Performance (Over)", "Training Performance (Under)", "Testing Performance (Original)", "Testing Performance (Over)", "Testing Performance (Under)"])
models = [] # Empty list to store all the models
bagging_model = ("Bagging", BaggingClassifier(random_state=1))
random_forest_model = ("RandomForest", RandomForestClassifier(random_state=1))
adaboost_model = ("AdaBoost", AdaBoostClassifier(random_state=1))
xgboost_model = ("XGBoost", XGBClassifier(random_state=1))
# Appending models into the list
models.append(bagging_model)
models.append(random_forest_model)
models.append(adaboost_model)
models.append(xgboost_model)
print("\n" "Training Performance:" "\n")
for name, model in models:
model.fit(X_train, y_train)
scores = recall_score(y_train, model.predict(X_train))
training = pd.DataFrame([(name, scores)], columns=['Model', 'Training Performance (Original)'])
results = pd.concat([results, training], ignore_index = True)
print("{}: {}".format(name, scores))
stacking_model = StackingClassifier(estimators=[bagging_model, xgboost_model, adaboost_model], final_estimator=random_forest_model[1])
stacking_model.fit(X_train, y_train)
stacking_score = recall_score(y_train, stacking_model.predict(X_train))
training = pd.DataFrame([("Stacking", stacking_score)], columns=['Model', 'Training Performance (Original)'])
results = pd.concat([results, training], ignore_index = True)
print("Stacking: {}".format(stacking_score))
print("\n" "Validation Performance:" "\n")
for name, model in models:
model.fit(X_train, y_train)
scores_val = recall_score(y_test, model.predict(X_test))
results.loc[results['Model'] == name, 'Testing Performance (Original)'] = scores_val
print("{}: {}".format(name, scores_val))
stacking_val = recall_score(y_test, stacking_model.predict(X_test))
results.loc[results['Model'] == 'Stacking', 'Testing Performance (Original)'] = stacking_val
print("Stacking: {}".format(stacking_val))
for name, model in models:
confusion_matrix_sklearn(name, model, X_test, y_test)
confusion_matrix_sklearn("Stacking", stacking_model, X_test, y_test)
Training Performance: Bagging: 0.9976466633047572 RandomForest: 1.0 AdaBoost: 0.9813414019162885 XGBoost: 1.0 Stacking: 0.998150949739452 Validation Performance: Bagging: 0.9753038024304195 RandomForest: 0.9890239121912975 AdaBoost: 0.981967855742846 XGBoost: 0.9870638965111721 Stacking: 0.9835358682869463
RandomForest had the least amount of false negatives, with XGBoost coming in second. Stacking surprisingly had the third highest, even though it leverages that of the rest of the models.XGBoost as the final estimator, but after assessing performance, this was changed to RandomForest wich reduced the False Negatives by 10.# Synthetic Minority Over Sampling Technique
sm = SMOTE(sampling_strategy=1, k_neighbors=5, random_state=1)
X_train_over, y_train_over = sm.fit_resample(X_train, y_train)
models_over = [] # Empty list to store all the models_over
bagging_model = ("Bagging", BaggingClassifier(random_state=1))
random_forest_model = ("RandomForest", RandomForestClassifier(random_state=1))
adaboost_model = ("AdaBoost", AdaBoostClassifier(random_state=1))
xgboost_model = ("XGBoost", XGBClassifier(random_state=1))
# Appending models_over into the list
models_over.append(bagging_model)
models_over.append(random_forest_model)
models_over.append(adaboost_model)
models_over.append(xgboost_model)
print("\n" "Training Performance:" "\n")
for name, model in models_over:
model.fit(X_train_over, y_train_over)
scores = recall_score(y_train_over, model.predict(X_train_over))
results.loc[results['Model'] == name, 'Training Performance (Over)'] = scores
print("{}: {}".format(name, scores))
stacking_model = StackingClassifier(estimators=[bagging_model, xgboost_model, adaboost_model], final_estimator=random_forest_model[1])
stacking_model.fit(X_train_over, y_train_over)
stacking_score = recall_score(y_train_over, stacking_model.predict(X_train_over))
results.loc[results['Model'] == 'Stacking', 'Training Performance (Over)'] = stacking_score
print("Stacking: {}".format(stacking_score))
print("\n" "Validation Performance:" "\n")
for name, model in models_over:
model.fit(X_train_over, y_train_over)
scores_val = recall_score(y_test, model.predict(X_test))
results.loc[results['Model'] == name, 'Testing Performance (Over)'] = scores_val
print("{}: {}".format(name, scores_val))
stacking_val = recall_score(y_test, stacking_model.predict(X_test))
results.loc[results['Model'] == 'Stacking', 'Testing Performance (Over)'] = stacking_val
print("Stacking: {}".format(stacking_val))
for name, model in models_over:
confusion_matrix_sklearn(name, model, X_test, y_test)
confusion_matrix_sklearn("Stacking", stacking_model, X_test, y_test)
Training Performance: Bagging: 0.9961338040006724 RandomForest: 1.0 AdaBoost: 0.958480416876786 XGBoost: 1.0 Stacking: 0.9966380904353673 Validation Performance: Bagging: 0.9588396707173658 RandomForest: 0.9741277930223442 AdaBoost: 0.9541356330850647 XGBoost: 0.9847118776950216 Stacking: 0.973343786750294
# Random undersampler for under sampling the data
rus = RandomUnderSampler(random_state=1, sampling_strategy=1)
X_train_un, y_train_un = rus.fit_resample(X_train, y_train)
models_under = [] # Empty list to store all the models_under
bagging_model = ("Bagging", BaggingClassifier(random_state=1))
random_forest_model = ("RandomForest", RandomForestClassifier(random_state=1))
adaboost_model = ("AdaBoost", AdaBoostClassifier(random_state=1))
xgboost_model = ("XGBoost", XGBClassifier(random_state=1))
# Appending models_under into the list
models_under.append(bagging_model)
models_under.append(random_forest_model)
models_under.append(adaboost_model)
models_under.append(xgboost_model)
print("\n" "Training Performance:" "\n")
for name, model in models_under:
model.fit(X_train_un, y_train_un)
scores = recall_score(y_train_un, model.predict(X_train_un))
results.loc[results['Model'] == name, 'Training Performance (Under)'] = scores
print("{}: {}".format(name, scores))
stacking_model = StackingClassifier(estimators=[bagging_model, xgboost_model, adaboost_model], final_estimator=random_forest_model[1])
stacking_model.fit(X_train_un, y_train_un)
stacking_score = recall_score(y_train_un, stacking_model.predict(X_train_un))
results.loc[results['Model'] == 'Stacking', 'Training Performance (Under)'] = stacking_score
print("Stacking: {}".format(stacking_score))
print("\n" "Validation Performance:" "\n")
for name, model in models_under:
model.fit(X_train_un, y_train_un)
scores_val = recall_score(y_test, model.predict(X_test))
results.loc[results['Model'] == name, 'Testing Performance (Under)'] = scores_val
print("{}: {}".format(name, scores_val))
stacking_val = recall_score(y_test, stacking_model.predict(X_test))
results.loc[results['Model'] == 'Stacking', 'Testing Performance (Under)'] = stacking_val
print("Stacking: {}".format(stacking_val))
for name, model in models_under:
confusion_matrix_sklearn(name, model, X_test, y_test)
confusion_matrix_sklearn("Stacking", stacking_model, X_test, y_test)
Training Performance: Bagging: 0.9938542581211589 RandomForest: 1.0 AdaBoost: 0.9438103599648815 XGBoost: 1.0 Stacking: 0.9982440737489026 Validation Performance: Bagging: 0.915327322618581 RandomForest: 0.9255194041552333 AdaBoost: 0.9239513916111329 XGBoost: 0.9502156017248138 Stacking: 0.9447275578204626
results.T
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| Model | Bagging | RandomForest | AdaBoost | XGBoost | Stacking |
| Training Performance (Original) | 0.997647 | 1.0 | 0.981341 | 1.0 | 0.998151 |
| Training Performance (Over) | 0.996134 | 1.0 | 0.95848 | 1.0 | 0.996638 |
| Training Performance (Under) | 0.993854 | 1.0 | 0.94381 | 1.0 | 0.998244 |
| Testing Performance (Original) | 0.975304 | 0.989024 | 0.981968 | 0.987064 | 0.983536 |
| Testing Performance (Over) | 0.95884 | 0.974128 | 0.954136 | 0.984712 | 0.973344 |
| Testing Performance (Under) | 0.915327 | 0.925519 | 0.923951 | 0.950216 | 0.944728 |
fig = go.Figure()
for col in results.columns[1:]:
fig.add_trace(go.Scatter(x=results['Model'], y=results[col], mode='lines+markers', name=col))
fig.update_layout(title='Performance for Each Model',
xaxis_title='Model',
yaxis_title='Performance')
fig.show()
Original datasets, with RandomForest performing the bestAdaBoost had the closest fit between training and testing sets for the Original data showing neither overfitting or underfitting of the data. This could be a good indication that it's handling the dataset well, and could increase performance after tuning.model: reasoning):RandomForest: highest recall score on both the training and testing of the Original data setAdaboost: had closest recalls core between test and training, indicating that it's fitting the data well between the two.XGBoost: was second in the recall score performance of the original data set, and it's also a popular choice due to it's robustness and complexity. Overall, studies have shown XGBoost, after proper hyperparameter tuning, to be a dependable model.Stacking (Bonus): After tuning, it will be interesting to see how this model's performance may be increased (or decreased).F1 Score > Recall
Initially, as illustrated above in testing out the broader variety of models, the focus was on recall, as our focus is to reduce as many false negatives as possible. However, recall score was yielding unfavorable results with during hyper parameter tuning. After some testing, by maximizing F1_Score, our models were finding a more desired balance between false positives and negatives, with them heavily favoring false positives. Therefore, the choice was switched to focus on F1.
# create recall score function
def recall (y_true, y_pred):
return recall_score(y_true, y_pred)
# Evaluation Function
def show_scores(model):
train_preds = model.predict(X_train)
val_preds = model.predict(X_test)
scores = {"Training Recall": recall(y_train, train_preds),
"Testing Recall": recall(y_test, val_preds),
"Training Precision": precision_score(y_train, train_preds),
"Testing Precision": precision_score(y_test, val_preds),
"Training F1 Score": f1_score(y_train, train_preds),
"Testing F1 Score": f1_score(y_test, val_preds),}
return scores
# defining kfold
kfold = StratifiedKFold(n_splits=10, random_state=42, shuffle = True)
def show_kfold_scores(model):
train_preds = cross_val_score(model, X_train, y_train, cv=kfold, scoring = 'f1')
kfold_scores={"Training F1 Scores": train_preds,
"F1 Repeatability (Training) %.3f%% (%.3f%%)": (train_preds.mean()*100.0, train_preds.std()*100.0)
}
return kfold_scores
RandomForest Tuning¶# RandomForest Tuning
rf_grid = {"n_estimators": np.arange(100, 1200, 50),
"max_depth": [None, 5, 10, 15, 20, 30],
"min_samples_split": np.arange(2, 20, 2),
"min_samples_leaf": np.arange(1, 20, 2),
"max_features": [0.5, 1, "sqrt", "auto", None, 'log2'],
"bootstrap": [True]}
# Instantiate RandomizedSearchCV model
rf_model = RandomizedSearchCV(RandomForestClassifier(n_jobs=-1,
random_state=42),
param_distributions=rf_grid,
n_iter=50,
cv=5,
verbose=2,
scoring='f1')
# Fit the RandomizedSearchCV model
rf_model.fit(X_train, y_train)
Fitting 5 folds for each of 50 candidates, totalling 250 fits [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=11, min_samples_split=4, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=11, min_samples_split=4, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=11, min_samples_split=4, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=11, min_samples_split=4, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=11, min_samples_split=4, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=17, min_samples_split=16, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=17, min_samples_split=16, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=17, min_samples_split=16, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=17, min_samples_split=16, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=17, min_samples_split=16, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=400; total time= 0.4s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=400; total time= 0.4s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=400; total time= 0.4s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=400; total time= 0.4s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=400; total time= 0.4s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=750; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=750; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=750; total time= 1.0s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=750; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=750; total time= 0.9s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=9, min_samples_split=18, n_estimators=1150; total time= 1.2s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=9, min_samples_split=18, n_estimators=1150; total time= 1.2s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=9, min_samples_split=18, n_estimators=1150; total time= 1.2s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=9, min_samples_split=18, n_estimators=1150; total time= 1.2s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=9, min_samples_split=18, n_estimators=1150; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=0.5, min_samples_leaf=9, min_samples_split=8, n_estimators=950; total time= 1.6s [CV] END bootstrap=True, max_depth=20, max_features=0.5, min_samples_leaf=9, min_samples_split=8, n_estimators=950; total time= 1.7s [CV] END bootstrap=True, max_depth=20, max_features=0.5, min_samples_leaf=9, min_samples_split=8, n_estimators=950; total time= 1.7s [CV] END bootstrap=True, max_depth=20, max_features=0.5, min_samples_leaf=9, min_samples_split=8, n_estimators=950; total time= 1.6s [CV] END bootstrap=True, max_depth=20, max_features=0.5, min_samples_leaf=9, min_samples_split=8, n_estimators=950; total time= 1.6s [CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=1, min_samples_split=4, n_estimators=1150; total time= 1.4s [CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=1, min_samples_split=4, n_estimators=1150; total time= 1.3s [CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=1, min_samples_split=4, n_estimators=1150; total time= 1.3s [CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=1, min_samples_split=4, n_estimators=1150; total time= 1.4s [CV] END bootstrap=True, max_depth=10, max_features=log2, min_samples_leaf=1, min_samples_split=4, n_estimators=1150; total time= 1.3s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=9, min_samples_split=14, n_estimators=650; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=9, min_samples_split=14, n_estimators=650; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=9, min_samples_split=14, n_estimators=650; total time= 0.7s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=9, min_samples_split=14, n_estimators=650; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=9, min_samples_split=14, n_estimators=650; total time= 0.6s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=17, min_samples_split=6, n_estimators=500; total time= 1.1s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=17, min_samples_split=6, n_estimators=500; total time= 1.1s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=17, min_samples_split=6, n_estimators=500; total time= 1.1s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=17, min_samples_split=6, n_estimators=500; total time= 1.2s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=17, min_samples_split=6, n_estimators=500; total time= 1.1s [CV] END bootstrap=True, max_depth=5, max_features=None, min_samples_leaf=13, min_samples_split=18, n_estimators=500; total time= 1.0s [CV] END bootstrap=True, max_depth=5, max_features=None, min_samples_leaf=13, min_samples_split=18, n_estimators=500; total time= 1.0s [CV] END bootstrap=True, max_depth=5, max_features=None, min_samples_leaf=13, min_samples_split=18, n_estimators=500; total time= 0.9s [CV] END bootstrap=True, max_depth=5, max_features=None, min_samples_leaf=13, min_samples_split=18, n_estimators=500; total time= 0.9s [CV] END bootstrap=True, max_depth=5, max_features=None, min_samples_leaf=13, min_samples_split=18, n_estimators=500; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=auto, min_samples_leaf=15, min_samples_split=6, n_estimators=400; total time= 0.0s [CV] END bootstrap=True, max_depth=30, max_features=auto, min_samples_leaf=15, min_samples_split=6, n_estimators=400; total time= 0.0s [CV] END bootstrap=True, max_depth=30, max_features=auto, min_samples_leaf=15, min_samples_split=6, n_estimators=400; total time= 0.0s [CV] END bootstrap=True, max_depth=30, max_features=auto, min_samples_leaf=15, min_samples_split=6, n_estimators=400; total time= 0.0s [CV] END bootstrap=True, max_depth=30, max_features=auto, min_samples_leaf=15, min_samples_split=6, n_estimators=400; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=15, min_samples_split=4, n_estimators=1050; total time= 1.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=15, min_samples_split=4, n_estimators=1050; total time= 1.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=15, min_samples_split=4, n_estimators=1050; total time= 1.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=15, min_samples_split=4, n_estimators=1050; total time= 1.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=15, min_samples_split=4, n_estimators=1050; total time= 1.2s [CV] END bootstrap=True, max_depth=None, max_features=1, min_samples_leaf=13, min_samples_split=14, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=None, max_features=1, min_samples_leaf=13, min_samples_split=14, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=None, max_features=1, min_samples_leaf=13, min_samples_split=14, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=None, max_features=1, min_samples_leaf=13, min_samples_split=14, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=None, max_features=1, min_samples_leaf=13, min_samples_split=14, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=7, min_samples_split=2, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=7, min_samples_split=2, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=7, min_samples_split=2, n_estimators=800; total time= 1.0s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=7, min_samples_split=2, n_estimators=800; total time= 1.0s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=7, min_samples_split=2, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=13, min_samples_split=10, n_estimators=1100; total time= 1.3s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=13, min_samples_split=10, n_estimators=1100; total time= 1.3s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=13, min_samples_split=10, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=13, min_samples_split=10, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=15, max_features=log2, min_samples_leaf=13, min_samples_split=10, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=None, max_features=None, min_samples_leaf=5, min_samples_split=18, n_estimators=350; total time= 0.9s [CV] END bootstrap=True, max_depth=None, max_features=None, min_samples_leaf=5, min_samples_split=18, n_estimators=350; total time= 0.9s [CV] END bootstrap=True, max_depth=None, max_features=None, min_samples_leaf=5, min_samples_split=18, n_estimators=350; total time= 0.9s [CV] END bootstrap=True, max_depth=None, max_features=None, min_samples_leaf=5, min_samples_split=18, n_estimators=350; total time= 0.9s [CV] END bootstrap=True, max_depth=None, max_features=None, min_samples_leaf=5, min_samples_split=18, n_estimators=350; total time= 0.8s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=17, min_samples_split=2, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=17, min_samples_split=2, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=17, min_samples_split=2, n_estimators=1000; total time= 1.1s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=17, min_samples_split=2, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=5, max_features=log2, min_samples_leaf=17, min_samples_split=2, n_estimators=1000; total time= 1.0s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time= 0.8s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time= 0.8s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time= 0.8s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time= 0.9s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=3, min_samples_split=6, n_estimators=500; total time= 0.8s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=5, min_samples_split=10, n_estimators=200; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=5, min_samples_split=10, n_estimators=200; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=5, min_samples_split=10, n_estimators=200; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=5, min_samples_split=10, n_estimators=200; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=5, min_samples_split=10, n_estimators=200; total time= 0.0s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=14, n_estimators=600; total time= 0.7s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=14, n_estimators=600; total time= 0.7s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=14, n_estimators=600; total time= 0.7s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=14, n_estimators=600; total time= 0.7s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=14, n_estimators=600; total time= 0.7s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=11, min_samples_split=2, n_estimators=900; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=11, min_samples_split=2, n_estimators=900; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=11, min_samples_split=2, n_estimators=900; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=11, min_samples_split=2, n_estimators=900; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=11, min_samples_split=2, n_estimators=900; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=10, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=15, max_features=None, min_samples_leaf=13, min_samples_split=10, n_estimators=1000; total time= 2.3s [CV] END bootstrap=True, max_depth=15, max_features=None, min_samples_leaf=13, min_samples_split=10, n_estimators=1000; total time= 2.3s [CV] END bootstrap=True, max_depth=15, max_features=None, min_samples_leaf=13, min_samples_split=10, n_estimators=1000; total time= 2.2s [CV] END bootstrap=True, max_depth=15, max_features=None, min_samples_leaf=13, min_samples_split=10, n_estimators=1000; total time= 2.6s [CV] END bootstrap=True, max_depth=15, max_features=None, min_samples_leaf=13, min_samples_split=10, n_estimators=1000; total time= 2.2s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=17, min_samples_split=12, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=17, min_samples_split=12, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=17, min_samples_split=12, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=17, min_samples_split=12, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=17, min_samples_split=12, n_estimators=550; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=sqrt, min_samples_leaf=9, min_samples_split=4, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=5, max_features=sqrt, min_samples_leaf=9, min_samples_split=4, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=5, max_features=sqrt, min_samples_leaf=9, min_samples_split=4, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=5, max_features=sqrt, min_samples_leaf=9, min_samples_split=4, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=5, max_features=sqrt, min_samples_leaf=9, min_samples_split=4, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=13, min_samples_split=10, n_estimators=900; total time= 1.0s [CV] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=13, min_samples_split=10, n_estimators=900; total time= 1.0s [CV] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=13, min_samples_split=10, n_estimators=900; total time= 1.1s [CV] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=13, min_samples_split=10, n_estimators=900; total time= 1.0s [CV] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=13, min_samples_split=10, n_estimators=900; total time= 1.0s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=2, n_estimators=600; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=2, n_estimators=600; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=2, n_estimators=600; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=2, n_estimators=600; total time= 1.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=2, n_estimators=600; total time= 1.5s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=19, min_samples_split=18, n_estimators=350; total time= 0.8s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=19, min_samples_split=18, n_estimators=350; total time= 0.8s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=19, min_samples_split=18, n_estimators=350; total time= 0.7s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=19, min_samples_split=18, n_estimators=350; total time= 0.8s [CV] END bootstrap=True, max_depth=30, max_features=None, min_samples_leaf=19, min_samples_split=18, n_estimators=350; total time= 0.7s [CV] END bootstrap=True, max_depth=10, max_features=sqrt, min_samples_leaf=13, min_samples_split=16, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=10, max_features=sqrt, min_samples_leaf=13, min_samples_split=16, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=10, max_features=sqrt, min_samples_leaf=13, min_samples_split=16, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=10, max_features=sqrt, min_samples_leaf=13, min_samples_split=16, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=10, max_features=sqrt, min_samples_leaf=13, min_samples_split=16, n_estimators=450; total time= 0.5s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=3, min_samples_split=16, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=3, min_samples_split=16, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=3, min_samples_split=16, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=3, min_samples_split=16, n_estimators=800; total time= 1.0s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=3, min_samples_split=16, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=17, min_samples_split=16, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=17, min_samples_split=16, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=17, min_samples_split=16, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=17, min_samples_split=16, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=30, max_features=sqrt, min_samples_leaf=17, min_samples_split=16, n_estimators=1100; total time= 1.2s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=7, min_samples_split=6, n_estimators=550; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=7, min_samples_split=6, n_estimators=550; total time= 1.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=7, min_samples_split=6, n_estimators=550; total time= 1.4s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=7, min_samples_split=6, n_estimators=550; total time= 1.3s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=7, min_samples_split=6, n_estimators=550; total time= 1.3s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=7, min_samples_split=6, n_estimators=700; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=7, min_samples_split=6, n_estimators=700; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=7, min_samples_split=6, n_estimators=700; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=7, min_samples_split=6, n_estimators=700; total time= 0.7s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=7, min_samples_split=6, n_estimators=700; total time= 0.7s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=1, min_samples_split=18, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=1, min_samples_split=18, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=1, min_samples_split=18, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=1, min_samples_split=18, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=10, max_features=1, min_samples_leaf=1, min_samples_split=18, n_estimators=550; total time= 0.6s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=7, min_samples_split=18, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=7, min_samples_split=18, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=7, min_samples_split=18, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=7, min_samples_split=18, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=20, max_features=auto, min_samples_leaf=7, min_samples_split=18, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=30, max_features=1, min_samples_leaf=17, min_samples_split=2, n_estimators=100; total time= 0.1s [CV] END bootstrap=True, max_depth=30, max_features=1, min_samples_leaf=17, min_samples_split=2, n_estimators=100; total time= 0.1s [CV] END bootstrap=True, max_depth=30, max_features=1, min_samples_leaf=17, min_samples_split=2, n_estimators=100; total time= 0.1s [CV] END bootstrap=True, max_depth=30, max_features=1, min_samples_leaf=17, min_samples_split=2, n_estimators=100; total time= 0.1s [CV] END bootstrap=True, max_depth=30, max_features=1, min_samples_leaf=17, min_samples_split=2, n_estimators=100; total time= 0.1s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=17, min_samples_split=12, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=17, min_samples_split=12, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=17, min_samples_split=12, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=17, min_samples_split=12, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=17, min_samples_split=12, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=5, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=5, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=5, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=5, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=5, max_features=auto, min_samples_leaf=1, min_samples_split=14, n_estimators=1150; total time= 0.0s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=250; total time= 0.3s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=250; total time= 0.3s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=250; total time= 0.3s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=250; total time= 0.3s [CV] END bootstrap=True, max_depth=5, max_features=1, min_samples_leaf=19, min_samples_split=6, n_estimators=250; total time= 0.3s [CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=9, min_samples_split=4, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=9, min_samples_split=4, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=9, min_samples_split=4, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=9, min_samples_split=4, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=30, max_features=log2, min_samples_leaf=9, min_samples_split=4, n_estimators=150; total time= 0.2s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=15, min_samples_split=6, n_estimators=600; total time= 0.9s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=15, min_samples_split=6, n_estimators=600; total time= 1.0s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=15, min_samples_split=6, n_estimators=600; total time= 0.9s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=15, min_samples_split=6, n_estimators=600; total time= 0.9s [CV] END bootstrap=True, max_depth=10, max_features=0.5, min_samples_leaf=15, min_samples_split=6, n_estimators=600; total time= 0.9s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=13, min_samples_split=18, n_estimators=450; total time= 0.7s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=13, min_samples_split=18, n_estimators=450; total time= 0.7s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=13, min_samples_split=18, n_estimators=450; total time= 0.7s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=13, min_samples_split=18, n_estimators=450; total time= 0.7s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=13, min_samples_split=18, n_estimators=450; total time= 0.7s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=20, max_features=log2, min_samples_leaf=1, min_samples_split=12, n_estimators=800; total time= 0.9s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=13, min_samples_split=12, n_estimators=400; total time= 0.6s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=13, min_samples_split=12, n_estimators=400; total time= 0.6s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=13, min_samples_split=12, n_estimators=400; total time= 0.6s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=13, min_samples_split=12, n_estimators=400; total time= 0.6s [CV] END bootstrap=True, max_depth=None, max_features=0.5, min_samples_leaf=13, min_samples_split=12, n_estimators=400; total time= 0.6s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=3, min_samples_split=4, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=3, min_samples_split=4, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=3, min_samples_split=4, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=3, min_samples_split=4, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=30, max_features=0.5, min_samples_leaf=3, min_samples_split=4, n_estimators=150; total time= 0.3s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=10, n_estimators=200; total time= 0.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=10, n_estimators=200; total time= 0.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=10, n_estimators=200; total time= 0.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=10, n_estimators=200; total time= 0.5s [CV] END bootstrap=True, max_depth=20, max_features=None, min_samples_leaf=11, min_samples_split=10, n_estimators=200; total time= 0.5s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=19, min_samples_split=8, n_estimators=300; total time= 0.3s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=19, min_samples_split=8, n_estimators=300; total time= 0.3s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=19, min_samples_split=8, n_estimators=300; total time= 0.3s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=19, min_samples_split=8, n_estimators=300; total time= 0.3s [CV] END bootstrap=True, max_depth=None, max_features=sqrt, min_samples_leaf=19, min_samples_split=8, n_estimators=300; total time= 0.3s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=19, min_samples_split=16, n_estimators=600; total time= 0.8s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=19, min_samples_split=16, n_estimators=600; total time= 0.8s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=19, min_samples_split=16, n_estimators=600; total time= 0.8s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=19, min_samples_split=16, n_estimators=600; total time= 0.8s [CV] END bootstrap=True, max_depth=5, max_features=0.5, min_samples_leaf=19, min_samples_split=16, n_estimators=600; total time= 0.8s [CV] END bootstrap=True, max_depth=15, max_features=0.5, min_samples_leaf=3, min_samples_split=18, n_estimators=1050; total time= 1.8s [CV] END bootstrap=True, max_depth=15, max_features=0.5, min_samples_leaf=3, min_samples_split=18, n_estimators=1050; total time= 1.7s [CV] END bootstrap=True, max_depth=15, max_features=0.5, min_samples_leaf=3, min_samples_split=18, n_estimators=1050; total time= 1.7s [CV] END bootstrap=True, max_depth=15, max_features=0.5, min_samples_leaf=3, min_samples_split=18, n_estimators=1050; total time= 1.7s [CV] END bootstrap=True, max_depth=15, max_features=0.5, min_samples_leaf=3, min_samples_split=18, n_estimators=1050; total time= 1.7s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=17, min_samples_split=18, n_estimators=750; total time= 0.8s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=17, min_samples_split=18, n_estimators=750; total time= 0.8s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=17, min_samples_split=18, n_estimators=750; total time= 0.8s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=17, min_samples_split=18, n_estimators=750; total time= 0.8s [CV] END bootstrap=True, max_depth=15, max_features=sqrt, min_samples_leaf=17, min_samples_split=18, n_estimators=750; total time= 0.8s
RandomizedSearchCV(cv=5,
estimator=RandomForestClassifier(n_jobs=-1, random_state=42),
n_iter=50,
param_distributions={'bootstrap': [True],
'max_depth': [None, 5, 10, 15, 20, 30],
'max_features': [0.5, 1, 'sqrt', 'auto',
None, 'log2'],
'min_samples_leaf': array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]),
'min_samples_split': array([ 2, 4, 6, 8, 10, 12, 14, 16, 18]),
'n_estimators': array([ 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600,
650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150])},
scoring='f1', verbose=2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomizedSearchCV(cv=5,
estimator=RandomForestClassifier(n_jobs=-1, random_state=42),
n_iter=50,
param_distributions={'bootstrap': [True],
'max_depth': [None, 5, 10, 15, 20, 30],
'max_features': [0.5, 1, 'sqrt', 'auto',
None, 'log2'],
'min_samples_leaf': array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]),
'min_samples_split': array([ 2, 4, 6, 8, 10, 12, 14, 16, 18]),
'n_estimators': array([ 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600,
650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150])},
scoring='f1', verbose=2)RandomForestClassifier(n_jobs=-1, random_state=42)
RandomForestClassifier(n_jobs=-1, random_state=42)
show_scores(rf_model)
{'Training Recall': 0.9956295175659775,
'Testing Recall': 0.9847118776950216,
'Training Precision': 0.9886496411283592,
'Testing Precision': 0.9676425269645609,
'Training F1 Score': 0.9921273031825796,
'Testing F1 Score': 0.9761025840295318}
rf_model_scores = show_scores(rf_model)
rf_model_scores = list(('Tuned Random Forest', *rf_model_scores.values()))
comp_df = pd.DataFrame([rf_model_scores], columns=['Model', 'Training Recall', 'Testing Recall', 'Training Precision', 'Testing Precision', 'Training F1 Score', 'Testing F1 Score'])
best_regression = rf_model.best_estimator_
best_regression
RandomForestClassifier(max_depth=10, max_features=0.5, min_samples_leaf=3,
min_samples_split=6, n_estimators=500, n_jobs=-1,
random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomForestClassifier(max_depth=10, max_features=0.5, min_samples_leaf=3,
min_samples_split=6, n_estimators=500, n_jobs=-1,
random_state=42)show_kfold_scores(best_regression)
{'Training F1 Scores': array([0.97829716, 0.97900924, 0.97410192, 0.97829716, 0.97019868,
0.97740586, 0.97911445, 0.975 , 0.97833333, 0.97921862]),
'F1 Repeatability (Training) %.3f%% (%.3f%%)': (97.68976420765298,
0.27801092440219194)}
confusion_matrix_sklearn("Tuned RandomForest", rf_model, X_test, y_test)
Best Model
RandomForestClassifier(max_depth=5, max_features=1, min_samples_leaf=11, min_samples_split=10, n_estimators=450, n_jobs=-1, random_state=42)
best_regression.fit(X, y)
# Get feature importances
importances = best_regression.feature_importances_
# Create a DataFrame with feature names and importances
feature_importances = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
# Sort the DataFrame by importance in descending order
feature_importances = feature_importances.sort_values('Importance', ascending=True)
# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.barh(feature_importances['Feature'], feature_importances['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
Observations
F1 score, without additional tuning, the model continually misclassified Customers as existing that would otherwise attrite.AdaBoost Tuning¶# RandomForest Tuning
ad_grid = {"n_estimators": randint(50, 500),
"learning_rate": [0.01, 0.05, 0.1, 0.3, 1],
"algorithm": ['SAMME', 'SAMME.R'],
}
# Instantiate RandomizedSearchCV model
ad_model = RandomizedSearchCV(AdaBoostClassifier(random_state=42),
param_distributions=ad_grid,
n_iter=50,
cv=5,
verbose=2,
scoring='f1')
# Fit the RandomizedSearchCV model
ad_model.fit(X_train, y_train)
Fitting 5 folds for each of 50 candidates, totalling 250 fits [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=219; total time= 1.2s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=219; total time= 1.2s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=219; total time= 1.2s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=219; total time= 1.2s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=219; total time= 1.2s [CV] END .algorithm=SAMME, learning_rate=1, n_estimators=249; total time= 1.4s [CV] END .algorithm=SAMME, learning_rate=1, n_estimators=249; total time= 1.4s [CV] END .algorithm=SAMME, learning_rate=1, n_estimators=249; total time= 1.4s [CV] END .algorithm=SAMME, learning_rate=1, n_estimators=249; total time= 1.4s [CV] END .algorithm=SAMME, learning_rate=1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=166; total time= 1.0s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=166; total time= 1.0s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=166; total time= 1.0s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=166; total time= 1.0s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=166; total time= 1.0s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=158; total time= 0.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=158; total time= 0.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=158; total time= 0.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=158; total time= 0.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=158; total time= 0.9s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=52; total time= 0.3s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=52; total time= 0.3s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=52; total time= 0.3s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=52; total time= 0.3s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=52; total time= 0.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=198; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=198; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=198; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=198; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=198; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=54; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=54; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=54; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=54; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=54; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=365; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=365; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=365; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=365; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=365; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=444; total time= 2.6s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=444; total time= 2.6s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=444; total time= 2.6s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=444; total time= 2.6s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=444; total time= 2.6s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=394; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=394; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=394; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=394; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=394; total time= 2.3s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=205; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=205; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=205; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=205; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=205; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=356; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=356; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=356; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=356; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=356; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=259; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=259; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=259; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=259; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=259; total time= 1.5s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=199; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=199; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=199; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=199; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=199; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=255; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=255; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=255; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=255; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=255; total time= 1.5s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=330; total time= 1.9s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=249; total time= 1.4s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=266; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=266; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=266; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=266; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=266; total time= 1.5s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=469; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=469; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=469; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=469; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=469; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=473; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=473; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=473; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=473; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=473; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=100; total time= 0.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=100; total time= 0.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=100; total time= 0.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=100; total time= 0.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=100; total time= 0.6s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=477; total time= 2.7s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=477; total time= 2.7s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=477; total time= 2.7s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=477; total time= 2.6s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=477; total time= 2.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=293; total time= 1.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=293; total time= 1.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=293; total time= 1.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=293; total time= 1.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=293; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=479; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=479; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=479; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=479; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=479; total time= 2.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=396; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=396; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=396; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=396; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=396; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=188; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=188; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=188; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=188; total time= 1.1s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=188; total time= 1.1s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=129; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=129; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=129; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=129; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=129; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=50; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=50; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=50; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=50; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=50; total time= 0.3s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=370; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=370; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=370; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=370; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=370; total time= 2.1s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=346; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=346; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=346; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=346; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=346; total time= 2.0s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=312; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=312; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=312; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=312; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=312; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=329; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=329; total time= 1.9s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=329; total time= 1.9s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=329; total time= 1.9s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=329; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=112; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=112; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=112; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=112; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=112; total time= 0.7s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.0s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=369; total time= 2.1s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=242; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=242; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=242; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=242; total time= 1.4s [CV] END algorithm=SAMME, learning_rate=0.1, n_estimators=242; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=76; total time= 0.4s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=76; total time= 0.4s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=76; total time= 0.4s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=76; total time= 0.4s [CV] END algorithm=SAMME, learning_rate=0.01, n_estimators=76; total time= 0.4s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=267; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=267; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=267; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=267; total time= 1.6s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=267; total time= 1.6s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=328; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=328; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=328; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=328; total time= 1.8s [CV] END algorithm=SAMME, learning_rate=0.3, n_estimators=328; total time= 1.8s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=401; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=401; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=401; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=401; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=401; total time= 2.3s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=497; total time= 2.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=497; total time= 2.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=497; total time= 2.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=497; total time= 2.9s [CV] END algorithm=SAMME.R, learning_rate=0.05, n_estimators=497; total time= 2.9s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=213; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=213; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=213; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=213; total time= 1.3s [CV] END algorithm=SAMME.R, learning_rate=0.01, n_estimators=213; total time= 1.2s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=461; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=461; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=461; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=461; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=1, n_estimators=461; total time= 2.6s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=230; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=230; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=230; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=230; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=230; total time= 1.3s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=88; total time= 0.5s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=88; total time= 0.5s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=88; total time= 0.5s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=88; total time= 0.5s [CV] END algorithm=SAMME, learning_rate=0.05, n_estimators=88; total time= 0.5s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=468; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=468; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=468; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=468; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.1, n_estimators=468; total time= 2.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=116; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=116; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=116; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=116; total time= 0.7s [CV] END algorithm=SAMME.R, learning_rate=0.3, n_estimators=116; total time= 0.7s
RandomizedSearchCV(cv=5, estimator=AdaBoostClassifier(random_state=42),
n_iter=50,
param_distributions={'algorithm': ['SAMME', 'SAMME.R'],
'learning_rate': [0.01, 0.05, 0.1, 0.3,
1],
'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x2f0104380>},
scoring='f1', verbose=2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomizedSearchCV(cv=5, estimator=AdaBoostClassifier(random_state=42),
n_iter=50,
param_distributions={'algorithm': ['SAMME', 'SAMME.R'],
'learning_rate': [0.01, 0.05, 0.1, 0.3,
1],
'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x2f0104380>},
scoring='f1', verbose=2)AdaBoostClassifier(random_state=42)
AdaBoostClassifier(random_state=42)
show_scores(ad_model)
{'Training Recall': 0.9865523617414692,
'Testing Recall': 0.9851038808310466,
'Training Precision': 0.9757273482959269,
'Testing Precision': 0.9710200927357032,
'Training F1 Score': 0.9811099966566366,
'Testing F1 Score': 0.9780112862424596}
ad_model_scores = show_scores(ad_model)
ad_model_scores = list(('Tuned AdaBoost', *ad_model_scores.values()))
comp_ad = pd.DataFrame([ad_model_scores], columns=['Model', 'Training Recall', 'Testing Recall', 'Training Precision', 'Testing Precision', 'Training F1 Score', 'Testing F1 Score'])
comp_df = pd.concat([comp_df, comp_ad], ignore_index=True)
best_adaboost = ad_model.best_estimator_
best_adaboost
AdaBoostClassifier(learning_rate=0.3, n_estimators=267, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
AdaBoostClassifier(learning_rate=0.3, n_estimators=267, random_state=42)
show_kfold_scores(best_adaboost)
{'Training F1 Scores': array([0.97573222, 0.98 , 0.97333333, 0.97751873, 0.98 ,
0.97666667, 0.97414512, 0.97666667, 0.96979866, 0.97993311]),
'F1 Repeatability (Training) %.3f%% (%.3f%%)': (97.63794507648024,
0.31353630149066325)}
best_adaboost.fit(X, y)
# Get feature importances
importances = best_adaboost.feature_importances_
# Create a DataFrame with feature names and importances
feature_importances = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
# Sort the DataFrame by importance in descending order
feature_importances = feature_importances.sort_values('Importance', ascending=True)
# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.barh(feature_importances['Feature'], feature_importances['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
AdaBoostClassifier(learning_rate=0.01, n_estimators=100, random_state=42)
confusion_matrix_sklearn("Tuned AdaBoost", best_adaboost, X_test, y_test)
Observations
Adaboost yielded more balanced results between false positives and false negatives than Random Forestadaboost continually showed heavy favoritism in 1 or 2 features, which would lend it to be succeptible to providing too much weight in themXGBoost¶RandomizedSearchCV¶xgb_grid = {
"n_estimators": [50, 100, 200, 300],
"max_depth": randint(3, 10),
"min_child_weight": [1, 2, 4],
"gamma": uniform(0, 0.5),
"subsample": [0.6, 0.8, 1.0],
"colsample_bytree": [0.6, 0.8, 1.0],
"learning_rate": [0.01, 0.05, 0.1],
"colsample_bylevel": [0.6, 0.8, 1.0]
}
grid_obj = RandomizedSearchCV(
estimator=XGBClassifier(),
param_distributions=xgb_grid,
scoring='f1',
n_iter=50,
cv=5,
verbose=3,
random_state=42,
n_jobs=1)
grid_obj.fit(X_train, y_train)
Fitting 5 folds for each of 50 candidates, totalling 250 fits [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.0917173949330819, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=100, subsample=1.0;, score=0.954 total time= 0.2s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.0917173949330819, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=100, subsample=1.0;, score=0.952 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.0917173949330819, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=100, subsample=1.0;, score=0.957 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.0917173949330819, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=100, subsample=1.0;, score=0.960 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.0917173949330819, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=100, subsample=1.0;, score=0.952 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.22962444598293358, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=100, subsample=0.6;, score=0.963 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.22962444598293358, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=100, subsample=0.6;, score=0.962 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.22962444598293358, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=100, subsample=0.6;, score=0.966 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.22962444598293358, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=100, subsample=0.6;, score=0.969 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.22962444598293358, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=100, subsample=0.6;, score=0.960 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10616955533913808, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.941 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10616955533913808, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.939 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10616955533913808, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.943 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10616955533913808, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.948 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10616955533913808, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.938 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.2623873301291946, learning_rate=0.1, max_depth=4, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.2623873301291946, learning_rate=0.1, max_depth=4, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.978 total time= 0.1s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.2623873301291946, learning_rate=0.1, max_depth=4, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.2623873301291946, learning_rate=0.1, max_depth=4, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.983 total time= 0.1s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.2623873301291946, learning_rate=0.1, max_depth=4, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.8, gamma=0.09983689107917987, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.981 total time= 0.2s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.8, gamma=0.09983689107917987, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.981 total time= 0.2s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.8, gamma=0.09983689107917987, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.981 total time= 0.2s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.8, gamma=0.09983689107917987, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.985 total time= 0.2s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.8, gamma=0.09983689107917987, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.982 total time= 0.2s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.08526206184364576, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.08526206184364576, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.980 total time= 0.2s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.08526206184364576, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.08526206184364576, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.08526206184364576, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.983 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.15230688458668534, learning_rate=0.01, max_depth=4, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.972 total time= 0.2s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.15230688458668534, learning_rate=0.01, max_depth=4, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.974 total time= 0.2s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.15230688458668534, learning_rate=0.01, max_depth=4, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.974 total time= 0.2s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.15230688458668534, learning_rate=0.01, max_depth=4, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.976 total time= 0.2s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.15230688458668534, learning_rate=0.01, max_depth=4, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.969 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.08668232675388604, learning_rate=0.01, max_depth=6, min_child_weight=2, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.08668232675388604, learning_rate=0.01, max_depth=6, min_child_weight=2, n_estimators=300, subsample=0.8;, score=0.976 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.08668232675388604, learning_rate=0.01, max_depth=6, min_child_weight=2, n_estimators=300, subsample=0.8;, score=0.976 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.08668232675388604, learning_rate=0.01, max_depth=6, min_child_weight=2, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.08668232675388604, learning_rate=0.01, max_depth=6, min_child_weight=2, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10397083143409441, learning_rate=0.05, max_depth=7, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10397083143409441, learning_rate=0.05, max_depth=7, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10397083143409441, learning_rate=0.05, max_depth=7, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10397083143409441, learning_rate=0.05, max_depth=7, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.984 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.10397083143409441, learning_rate=0.05, max_depth=7, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.981 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.4474136752138244, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.3s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.4474136752138244, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.984 total time= 0.3s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.4474136752138244, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.981 total time= 0.3s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.4474136752138244, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.985 total time= 0.3s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.4474136752138244, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.980 total time= 0.3s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.48058601217467456, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.972 total time= 0.2s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.48058601217467456, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.971 total time= 0.2s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.48058601217467456, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.970 total time= 0.2s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.48058601217467456, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.974 total time= 0.2s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.48058601217467456, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.968 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.14046725484369038, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.953 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.14046725484369038, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.956 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.14046725484369038, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.954 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.14046725484369038, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.962 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.14046725484369038, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.6;, score=0.955 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49344346830025865, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=200, subsample=1.0;, score=0.971 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49344346830025865, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=200, subsample=1.0;, score=0.972 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49344346830025865, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=200, subsample=1.0;, score=0.967 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49344346830025865, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=200, subsample=1.0;, score=0.973 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49344346830025865, learning_rate=0.01, max_depth=6, min_child_weight=4, n_estimators=200, subsample=1.0;, score=0.967 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3950877702656028, learning_rate=0.1, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.8;, score=0.980 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3950877702656028, learning_rate=0.1, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.8;, score=0.983 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3950877702656028, learning_rate=0.1, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.8;, score=0.981 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3950877702656028, learning_rate=0.1, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.8;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3950877702656028, learning_rate=0.1, max_depth=7, min_child_weight=2, n_estimators=200, subsample=0.8;, score=0.981 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.42501928889489965, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=200, subsample=0.8;, score=0.977 total time= 0.1s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.42501928889489965, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=200, subsample=0.8;, score=0.978 total time= 0.1s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.42501928889489965, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=200, subsample=0.8;, score=0.979 total time= 0.1s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.42501928889489965, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=200, subsample=0.8;, score=0.982 total time= 0.1s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.42501928889489965, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=200, subsample=0.8;, score=0.978 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.29564889385386356, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.973 total time= 0.0s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.29564889385386356, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.972 total time= 0.0s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.29564889385386356, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.973 total time= 0.0s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.29564889385386356, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.979 total time= 0.0s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.29564889385386356, learning_rate=0.05, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.972 total time= 0.0s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3566223936114975, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.975 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3566223936114975, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.974 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3566223936114975, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.973 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3566223936114975, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.980 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3566223936114975, learning_rate=0.01, max_depth=7, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.971 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.020216794769215674, learning_rate=0.1, max_depth=9, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.981 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.020216794769215674, learning_rate=0.1, max_depth=9, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.981 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.020216794769215674, learning_rate=0.1, max_depth=9, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.981 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.020216794769215674, learning_rate=0.1, max_depth=9, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.984 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.020216794769215674, learning_rate=0.1, max_depth=9, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.982 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4478817978367597, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.979 total time= 0.3s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4478817978367597, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4478817978367597, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.976 total time= 0.3s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4478817978367597, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.980 total time= 0.3s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4478817978367597, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3022086896389086, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3022086896389086, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.983 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3022086896389086, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3022086896389086, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.3022086896389086, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.983 total time= 0.2s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3473924665198523, learning_rate=0.05, max_depth=4, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.977 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3473924665198523, learning_rate=0.05, max_depth=4, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.978 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3473924665198523, learning_rate=0.05, max_depth=4, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.976 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3473924665198523, learning_rate=0.05, max_depth=4, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.981 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3473924665198523, learning_rate=0.05, max_depth=4, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.977 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.22826728524145512, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.3s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.22826728524145512, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.3s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.22826728524145512, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.981 total time= 0.3s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.22826728524145512, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.986 total time= 0.3s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.22826728524145512, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.981 total time= 0.3s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.15900173748593194, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.8;, score=0.958 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.15900173748593194, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.8;, score=0.956 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.15900173748593194, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.8;, score=0.958 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.15900173748593194, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.8;, score=0.962 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.15900173748593194, learning_rate=0.01, max_depth=3, min_child_weight=1, n_estimators=200, subsample=0.8;, score=0.954 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.4303652916281717, learning_rate=0.1, max_depth=8, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.979 total time= 0.1s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.4303652916281717, learning_rate=0.1, max_depth=8, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.979 total time= 0.1s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.4303652916281717, learning_rate=0.1, max_depth=8, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.980 total time= 0.1s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.4303652916281717, learning_rate=0.1, max_depth=8, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.984 total time= 0.1s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.4303652916281717, learning_rate=0.1, max_depth=8, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.978 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.0599326836668414, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=300, subsample=0.6;, score=0.980 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.0599326836668414, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=300, subsample=0.6;, score=0.980 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.0599326836668414, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.0599326836668414, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=300, subsample=0.6;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.0599326836668414, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=300, subsample=0.6;, score=0.981 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.48589104136048034, learning_rate=0.05, max_depth=9, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.980 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.48589104136048034, learning_rate=0.05, max_depth=9, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.980 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.48589104136048034, learning_rate=0.05, max_depth=9, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.980 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.48589104136048034, learning_rate=0.05, max_depth=9, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.48589104136048034, learning_rate=0.05, max_depth=9, min_child_weight=2, n_estimators=200, subsample=0.6;, score=0.981 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49887024252447093, learning_rate=0.1, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.975 total time= 0.1s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49887024252447093, learning_rate=0.1, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.977 total time= 0.1s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49887024252447093, learning_rate=0.1, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49887024252447093, learning_rate=0.1, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.982 total time= 0.1s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.6, gamma=0.49887024252447093, learning_rate=0.1, max_depth=3, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.975 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.1725356240133415, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.950 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.1725356240133415, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.949 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.1725356240133415, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.951 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.1725356240133415, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.956 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.1725356240133415, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=100, subsample=0.6;, score=0.947 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1210276357557502, learning_rate=0.1, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.980 total time= 0.0s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1210276357557502, learning_rate=0.1, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.978 total time= 0.0s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1210276357557502, learning_rate=0.1, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.979 total time= 0.0s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1210276357557502, learning_rate=0.1, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.982 total time= 0.0s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1210276357557502, learning_rate=0.1, max_depth=5, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.978 total time= 0.0s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.31615291529678974, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=200, subsample=0.6;, score=0.980 total time= 0.2s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.31615291529678974, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=200, subsample=0.6;, score=0.981 total time= 0.2s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.31615291529678974, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=200, subsample=0.6;, score=0.980 total time= 0.2s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.31615291529678974, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=200, subsample=0.6;, score=0.986 total time= 0.2s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.31615291529678974, learning_rate=0.05, max_depth=8, min_child_weight=4, n_estimators=200, subsample=0.6;, score=0.979 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.09325925519992712, learning_rate=0.01, max_depth=5, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.975 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.09325925519992712, learning_rate=0.01, max_depth=5, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.974 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.09325925519992712, learning_rate=0.01, max_depth=5, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.975 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.09325925519992712, learning_rate=0.01, max_depth=5, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.980 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.09325925519992712, learning_rate=0.01, max_depth=5, min_child_weight=4, n_estimators=300, subsample=1.0;, score=0.972 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.4047505230698577, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.979 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.4047505230698577, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.980 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.4047505230698577, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.978 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.4047505230698577, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.983 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.4047505230698577, learning_rate=0.1, max_depth=6, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.980 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.33784505851964036, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.953 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.33784505851964036, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.956 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.33784505851964036, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.955 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.33784505851964036, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.964 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.33784505851964036, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=200, subsample=1.0;, score=0.954 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1289708138575778, learning_rate=0.05, max_depth=3, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.970 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1289708138575778, learning_rate=0.05, max_depth=3, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.971 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1289708138575778, learning_rate=0.05, max_depth=3, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.974 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1289708138575778, learning_rate=0.05, max_depth=3, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.974 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.1289708138575778, learning_rate=0.05, max_depth=3, min_child_weight=1, n_estimators=100, subsample=0.6;, score=0.967 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4502090285816652, learning_rate=0.1, max_depth=9, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.982 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4502090285816652, learning_rate=0.1, max_depth=9, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.980 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4502090285816652, learning_rate=0.1, max_depth=9, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.978 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4502090285816652, learning_rate=0.1, max_depth=9, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.981 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4502090285816652, learning_rate=0.1, max_depth=9, min_child_weight=1, n_estimators=50, subsample=1.0;, score=0.980 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4281621459390462, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.961 total time= 0.0s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4281621459390462, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.966 total time= 0.0s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4281621459390462, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.965 total time= 0.0s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4281621459390462, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.970 total time= 0.0s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.4281621459390462, learning_rate=0.05, max_depth=3, min_child_weight=4, n_estimators=50, subsample=0.8;, score=0.962 total time= 0.0s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.18614138328087154, learning_rate=0.01, max_depth=5, min_child_weight=1, n_estimators=50, subsample=0.8;, score=0.913 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.18614138328087154, learning_rate=0.01, max_depth=5, min_child_weight=1, n_estimators=50, subsample=0.8;, score=0.913 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.18614138328087154, learning_rate=0.01, max_depth=5, min_child_weight=1, n_estimators=50, subsample=0.8;, score=0.913 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.18614138328087154, learning_rate=0.01, max_depth=5, min_child_weight=1, n_estimators=50, subsample=0.8;, score=0.913 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.18614138328087154, learning_rate=0.01, max_depth=5, min_child_weight=1, n_estimators=50, subsample=0.8;, score=0.913 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.2428068767931133, learning_rate=0.1, max_depth=7, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.980 total time= 0.2s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.2428068767931133, learning_rate=0.1, max_depth=7, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.984 total time= 0.2s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.2428068767931133, learning_rate=0.1, max_depth=7, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.981 total time= 0.2s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.2428068767931133, learning_rate=0.1, max_depth=7, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.986 total time= 0.2s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.8, gamma=0.2428068767931133, learning_rate=0.1, max_depth=7, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.16269984907963386, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.983 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.16269984907963386, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.980 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.16269984907963386, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.979 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.16269984907963386, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.984 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.16269984907963386, learning_rate=0.1, max_depth=5, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.981 total time= 0.2s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3181663090929477, learning_rate=0.05, max_depth=8, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.981 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3181663090929477, learning_rate=0.05, max_depth=8, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3181663090929477, learning_rate=0.05, max_depth=8, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.978 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3181663090929477, learning_rate=0.05, max_depth=8, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.982 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=1.0, gamma=0.3181663090929477, learning_rate=0.05, max_depth=8, min_child_weight=2, n_estimators=100, subsample=0.8;, score=0.982 total time= 0.1s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.44602327758855664, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.979 total time= 0.0s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.44602327758855664, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.979 total time= 0.0s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.44602327758855664, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.980 total time= 0.0s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.44602327758855664, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.981 total time= 0.0s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.44602327758855664, learning_rate=0.1, max_depth=5, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.978 total time= 0.0s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.24625884690943195, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=1.0;, score=0.980 total time= 0.3s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.24625884690943195, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=1.0;, score=0.977 total time= 0.3s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.24625884690943195, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=1.0;, score=0.977 total time= 0.3s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.24625884690943195, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=1.0;, score=0.982 total time= 0.3s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=1.0, gamma=0.24625884690943195, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=1.0;, score=0.978 total time= 0.3s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.021801885877216876, learning_rate=0.05, max_depth=7, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.975 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.021801885877216876, learning_rate=0.05, max_depth=7, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.974 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.021801885877216876, learning_rate=0.05, max_depth=7, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.972 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.021801885877216876, learning_rate=0.05, max_depth=7, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.977 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.021801885877216876, learning_rate=0.05, max_depth=7, min_child_weight=2, n_estimators=50, subsample=0.8;, score=0.971 total time= 0.1s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.4765359235119766, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.0s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.4765359235119766, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.0s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.4765359235119766, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.0s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.4765359235119766, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.0s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.4765359235119766, learning_rate=0.01, max_depth=3, min_child_weight=2, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.0s [CV 1/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.037673128003064105, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.1s [CV 2/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.037673128003064105, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.1s [CV 3/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.037673128003064105, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.1s [CV 4/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.037673128003064105, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.914 total time= 0.1s [CV 5/5] END colsample_bylevel=1.0, colsample_bytree=0.6, gamma=0.037673128003064105, learning_rate=0.01, max_depth=7, min_child_weight=4, n_estimators=50, subsample=1.0;, score=0.913 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.1973457334047361, learning_rate=0.05, max_depth=4, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.1973457334047361, learning_rate=0.05, max_depth=4, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.1973457334047361, learning_rate=0.05, max_depth=4, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.982 total time= 0.2s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.1973457334047361, learning_rate=0.05, max_depth=4, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.986 total time= 0.2s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.1973457334047361, learning_rate=0.05, max_depth=4, min_child_weight=4, n_estimators=300, subsample=0.8;, score=0.981 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.49502692505213164, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.49502692505213164, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.980 total time= 0.1s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.49502692505213164, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.49502692505213164, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.984 total time= 0.1s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.49502692505213164, learning_rate=0.05, max_depth=9, min_child_weight=4, n_estimators=100, subsample=0.8;, score=0.979 total time= 0.1s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4243348974623372, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=300, subsample=1.0;, score=0.977 total time= 0.2s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4243348974623372, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=300, subsample=1.0;, score=0.974 total time= 0.2s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4243348974623372, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=300, subsample=1.0;, score=0.973 total time= 0.2s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4243348974623372, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=300, subsample=1.0;, score=0.980 total time= 0.2s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=0.6, gamma=0.4243348974623372, learning_rate=0.01, max_depth=5, min_child_weight=2, n_estimators=300, subsample=1.0;, score=0.972 total time= 0.2s [CV 1/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.4046805777392568, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 2/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.4046805777392568, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.981 total time= 0.2s [CV 3/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.4046805777392568, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.982 total time= 0.2s [CV 4/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.4046805777392568, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.985 total time= 0.2s [CV 5/5] END colsample_bylevel=0.6, colsample_bytree=0.8, gamma=0.4046805777392568, learning_rate=0.1, max_depth=4, min_child_weight=1, n_estimators=300, subsample=0.6;, score=0.980 total time= 0.2s [CV 1/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.2507581473435998, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.981 total time= 0.3s [CV 2/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.2507581473435998, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s [CV 3/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.2507581473435998, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.975 total time= 0.3s [CV 4/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.2507581473435998, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.983 total time= 0.3s [CV 5/5] END colsample_bylevel=0.8, colsample_bytree=1.0, gamma=0.2507581473435998, learning_rate=0.01, max_depth=6, min_child_weight=1, n_estimators=300, subsample=0.8;, score=0.978 total time= 0.3s
RandomizedSearchCV(cv=5,
estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None,
colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=False,
eval_metric=None, feature_types=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None,
learning_rate...
'colsample_bytree': [0.6, 0.8, 1.0],
'gamma': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x2ce3ee330>,
'learning_rate': [0.01, 0.05, 0.1],
'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x2fa2a8cb0>,
'min_child_weight': [1, 2, 4],
'n_estimators': [50, 100, 200, 300],
'subsample': [0.6, 0.8, 1.0]},
random_state=42, scoring='f1', verbose=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomizedSearchCV(cv=5,
estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None,
colsample_bylevel=None,
colsample_bynode=None,
colsample_bytree=None, device=None,
early_stopping_rounds=None,
enable_categorical=False,
eval_metric=None, feature_types=None,
gamma=None, grow_policy=None,
importance_type=None,
interaction_constraints=None,
learning_rate...
'colsample_bytree': [0.6, 0.8, 1.0],
'gamma': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x2ce3ee330>,
'learning_rate': [0.01, 0.05, 0.1],
'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x2fa2a8cb0>,
'min_child_weight': [1, 2, 4],
'n_estimators': [50, 100, 200, 300],
'subsample': [0.6, 0.8, 1.0]},
random_state=42, scoring='f1', verbose=3)XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=None, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=None, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=None, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=None, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)best_xgb = grid_obj.best_estimator_
best_xgb
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=0.6, colsample_bynode=None,
colsample_bytree=1.0, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=0.3022086896389086, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=5, max_leaves=None,
min_child_weight=1, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=300, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=0.6, colsample_bynode=None,
colsample_bytree=1.0, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=0.3022086896389086, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=5, max_leaves=None,
min_child_weight=1, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=300, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)show_scores(best_xgb)
{'Training Recall': 1.0,
'Testing Recall': 0.9870638965111721,
'Training Precision': 1.0,
'Testing Precision': 0.9774844720496895,
'Training F1 Score': 1.0,
'Testing F1 Score': 0.982250828944802}
best_xgb_model_scores = show_scores(best_xgb)
best_xgb_model_scores = list(('Tuned XGBoost', *best_xgb_model_scores.values()))
comp_xgb = pd.DataFrame([best_xgb_model_scores], columns=['Model', 'Training Recall', 'Testing Recall', 'Training Precision', 'Testing Precision', 'Training F1 Score', 'Testing F1 Score'])
comp_df = pd.concat([comp_df, comp_xgb], ignore_index=True)
Best Parameters for XGBoost
XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=0.6, colsample_bynode=None, colsample_bytree=1.0, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=0.3022086896389086, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=0.1, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=5, max_leaves=None, min_child_weight=1, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=300, n_jobs=None, num_parallel_tree=None, random_state=None, ...)
confusion_matrix_sklearn("Tuned XGBoost", best_xgb, X_test, y_test)
show_kfold_scores(best_xgb)
{'Training F1 Scores': array([0.98494983, 0.98324958, 0.98402019, 0.98072087, 0.98088113,
0.98993289, 0.97996661, 0.98245614, 0.98407376, 0.98242678]),
'F1 Repeatability (Training) %.3f%% (%.3f%%)': (98.32677780442921,
0.27002886761973083)}
best_xgb.fit(X, y)
# Get feature importances
importances = best_xgb.feature_importances_
# Create a DataFrame with feature names and importances
feature_importances = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
# Sort the DataFrame by importance in descending order
feature_importances = feature_importances.sort_values('Importance', ascending=True)
# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.barh(feature_importances['Feature'], feature_importances['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
Observations
CHOSEN MODEL
XGBoost is a popular model. It showed a similar balance of false negatives and false positives, but provided greater importance across a number of data points. This would reduce potential errors in identifying Customers incorrectly as either retained or lost. KFold and found that XGBoost had the highest repeatable scores at nearly a 98% confidence.Stacking Score¶stacking_model = StackingClassifier(estimators=[('best_regression', best_regression), ('best_adaboost', best_adaboost)], final_estimator=best_xgb)
stacking_model.fit(X_train, y_train)
StackingClassifier(estimators=[('best_regression',
RandomForestClassifier(max_depth=10,
max_features=0.5,
min_samples_leaf=3,
min_samples_split=6,
n_estimators=500,
n_jobs=-1,
random_state=42)),
('best_adaboost',
AdaBoostClassifier(learning_rate=0.3,
n_estimators=267,
random_state=42))],
final_estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None...
gamma=0.3022086896389086,
grow_policy=None,
importance_type=None,
interaction_constraints=None,
learning_rate=0.1,
max_bin=None,
max_cat_threshold=None,
max_cat_to_onehot=None,
max_delta_step=None,
max_depth=5, max_leaves=None,
min_child_weight=1,
missing=nan,
monotone_constraints=None,
multi_strategy=None,
n_estimators=300, n_jobs=None,
num_parallel_tree=None,
random_state=None, ...))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. StackingClassifier(estimators=[('best_regression',
RandomForestClassifier(max_depth=10,
max_features=0.5,
min_samples_leaf=3,
min_samples_split=6,
n_estimators=500,
n_jobs=-1,
random_state=42)),
('best_adaboost',
AdaBoostClassifier(learning_rate=0.3,
n_estimators=267,
random_state=42))],
final_estimator=XGBClassifier(base_score=None, booster=None,
callbacks=None...
gamma=0.3022086896389086,
grow_policy=None,
importance_type=None,
interaction_constraints=None,
learning_rate=0.1,
max_bin=None,
max_cat_threshold=None,
max_cat_to_onehot=None,
max_delta_step=None,
max_depth=5, max_leaves=None,
min_child_weight=1,
missing=nan,
monotone_constraints=None,
multi_strategy=None,
n_estimators=300, n_jobs=None,
num_parallel_tree=None,
random_state=None, ...))RandomForestClassifier(max_depth=10, max_features=0.5, min_samples_leaf=3,
min_samples_split=6, n_estimators=500, n_jobs=-1,
random_state=42)AdaBoostClassifier(learning_rate=0.3, n_estimators=267, random_state=42)
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=0.6, colsample_bynode=None,
colsample_bytree=1.0, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=0.3022086896389086, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.1, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=5, max_leaves=None,
min_child_weight=1, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=300, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)stacking_score = f1_score(y_train, stacking_model.predict(X_train))
print("Stacking - F1 Training: {}".format(stacking_score))
stacking_val = f1_score(y_test, stacking_model.predict(X_test))
print("Stacking - F1 Test: {}".format(stacking_val))
Stacking - F1 Training: 0.9881840274868013 Stacking - F1 Test: 0.9808668488871535
confusion_matrix_sklearn("Stacking", stacking_model, X_test, y_test)
show_scores(stacking_model)
{'Training Recall': 0.9910909396537233,
'Testing Recall': 0.9847118776950216,
'Training Precision': 0.9852941176470589,
'Testing Precision': 0.9770517308440295,
'Training F1 Score': 0.9881840274868013,
'Testing F1 Score': 0.9808668488871535}
stacking_model_scores = show_scores(stacking_model)
stacking_model_scores = list(('Stacking with Tuned', *stacking_model_scores.values()))
comp_stacking = pd.DataFrame([stacking_model_scores], columns=['Model', 'Training Recall', 'Testing Recall', 'Training Precision', 'Testing Precision', 'Training F1 Score', 'Testing F1 Score'])
comp_df = pd.concat([comp_df, comp_stacking], ignore_index=True)
Overview
The final models chose were RandomForest, AdaBoost, and XGBoost classifiers, with two additional models as an exercise utilizing Optuna with XGBoost, as well as Stacking to see if we can yield better results leveraging all of the models in combination.
Measuring the Models
Initially, the idea was to focus on reducing any false negatives by maximizing Recall score. However, with the exception of the XGBoost Model tuned with RandomizedSearchCV, the maximizing of Recall did not yield any appropriately identified Customers who won't churn giving all the data as False Positives. Although this would still yield a result that the Bank could target Customers who are at risk of attrition, it could use up unnecessary resources. Therefore, there needs to be a balance between F1 score, and Recall.
comp_df.T
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| Model | Tuned Random Forest | Tuned AdaBoost | Tuned XGBoost | Stacking with Tuned |
| Training Recall | 0.99563 | 0.986552 | 1.0 | 0.991091 |
| Testing Recall | 0.984712 | 0.985104 | 0.987064 | 0.984712 |
| Training Precision | 0.98865 | 0.975727 | 1.0 | 0.985294 |
| Testing Precision | 0.967643 | 0.97102 | 0.977484 | 0.977052 |
| Training F1 Score | 0.992127 | 0.98111 | 1.0 | 0.988184 |
| Testing F1 Score | 0.976103 | 0.978011 | 0.982251 | 0.980867 |
Model Choice: XGBoost using RandomizedSearchCV
As I discussed above, the model that reduces the most amount of false negatives isn't necessarily the best since it can take considerable resources in order to deploy necessary initiatives to prevent Customer churn. What's more important is the model that can balance the false negatives, identifying more true negatives, while minimizing the false positives.
chosen_model = best_xgb.fit(X_test, y_test)
test_performance = {
"Testing Accuracy": accuracy_score(y_test, chosen_model.predict(X_test)),
"Testing Recall": recall_score(y_test, chosen_model.predict(X_test)),
"Testing Precision": precision_score(y_test, chosen_model.predict(X_test)),
"Testing F1 Score": f1_score(y_test, chosen_model.predict(X_test)),
}
test_performance
{'Testing Accuracy': 1.0,
'Testing Recall': 1.0,
'Testing Precision': 1.0,
'Testing F1 Score': 1.0}
confusion_matrix_sklearn("Tuned XGBoost w/ RandomizedSearchCV", chosen_model, X_test, y_test)
chosen_model.fit(X, y)
# Get feature importances
importances = chosen_model.feature_importances_
# Create a DataFrame with feature names and importances
feature_importances = pd.DataFrame({'Feature': X.columns, 'Importance': importances})
# Sort the DataFrame by importance in descending order
feature_importances = feature_importances.sort_values('Importance', ascending=True)
# Plot the feature importances
plt.figure(figsize=(10, 6))
plt.barh(feature_importances['Feature'], feature_importances['Importance'])
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance')
plt.show()
# defining kfold
kfold = StratifiedKFold(n_splits=10, random_state=42, shuffle = True)
results = cross_val_score(chosen_model, X_train, y_train, cv=kfold, scoring='f1')
test_results = cross_val_score(chosen_model, X_test, y_test, cv=kfold, scoring = 'f1')
print("Training:", results)
print("Testing:", test_results)
print("F1 Repeatability (Training): %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
print("F1 Repeatability (Testing): %.3f%% (%.3f%%)" % (test_results.mean()*100.0, test_results.std()*100.0))
Training: [0.98494983 0.98324958 0.98402019 0.98072087 0.98088113 0.98993289 0.97996661 0.98245614 0.98407376 0.98242678] Testing: [0.97674419 0.97276265 0.9787234 0.98635478 0.97674419 0.98832685 0.98046875 0.984375 0.96699029 0.97864078] F1 Repeatability (Training): 98.327% (0.270%) F1 Repeatability (Testing): 97.901% (0.605%)
XGBoost model yielded a 100% accuracy during final testing, and f1 score in properly categorizing existing and attrited customers in testing at a greater than 97.5% confidence.
The biggest revelation from the data is that the less a Customer utilizes the services of the Credit Card company, the more likely they're to close their account(s). This is evidence by the chosen model's display of Feature Importance, which illustrates what data points the model found most useful for determining if a Customer will close their account. As detailed below, the key for Customer retention is to improve their utilization of the credit card in every facet. Less = leave! More insights and recommendations provided below.
Other Insights from Data
Total_Trans_Ct), the revolving balance (Total_Revolving_Bal), the average card utilizaiton ratio (how much the Customer spent compared to their credit limit - Avg_Utilization_Ratio), and their transaction amount in the previous 12 months (Total_Trans_Amt). In each of the final three models that were evaluated for best use, each of these data points were listed as among the most important. It's important to focus on the following in relation to each:Total_Trans_Ct): Customers at risk for attrition had a lower overall number of transactions. Even with the inclusion of the outlier attrited customers, the value never exceeded 100 total transactions, whereas existing customers were as high as round 140, with a majority hovering around 70.Total_Revolving_Bal): This is a harder one to keep an eye on as the range for both existing and attrited customers goes from $0 to $2,500. However, about 75% of attrited Customers don't exceed around $1300. This datapoint would need further investigation, as available credit limit could also have to do with this.Avg_Utilization_Ratio): About 75% of attrited Customers keep their overall utilization low (around 22%), but there are some that go as high as 100% utilization. Those exceeding around 58% are considered outlier Customers, and would need to understand further. Again, much like the other data points, low utilization ratio could be considered fiscal responsibility, or the fact that these Customers may prefer other methods of payment or competitor credit cards as their preferred use, which would lend them to close their account.Total_Trans_Amt): As with all the other datapoints, the more that's transacted, the least likely a Customer is to leave.